基于 FPGA 的 YOLOv2 加速器设计

吉林大学学报(信息科学版) ›› 2021, Vol. 39 ›› Issue (4): 445-450.

基于 FPGA 的 YOLOv2 加速器设计

梁洪卫¹ , 白鹏程¹ , 陈建玲² , 孙勤江² , 陈明虎¹ , 薛祥凯¹ #br#

1. 东北石油大学电气信息工程学院, 黑龙江大庆 163318; 2. 中海石油(中国)有限公司天津分公司, 天津 300459

收稿日期:2021-01-27 出版日期:2021-07-24 发布日期:2021-08-07
作者简介:梁洪卫(1978— ), 男, 山东武城人, 东北石油大学副教授, 博士, 主要从事油气信息处理、人工智能研究, ( Tel)86- 13936964989(E-mail)lianghongwei@ nepu. edu. cn
基金资助:
东北石油大学国家基金培育基金资助项目(2018GPYB-03)

Design of YOLOv2 Accelerator Based on FPGA

LIANG Hongwei¹ , BAI Pengcheng¹ , CHEN Jianling² , SUN Qinjiang² , CHEN Minghu¹ , XUE Xiangkai¹

1. School of Electrical Engineering and Information, Northeast Petroleum University, Daqing 163318, China; 2. Tianjin Branch, CNOOC China Limited, Tianjin 300459, China

Received:2021-01-27 Online:2021-07-24 Published:2021-08-07

摘要/Abstract

摘要： 卷积神经网络(CNN: Convolutional Neural Network)计算量较大, 为达到快速处理数据的目的, 需借助硬件手段进行加速。因此, 利用现场可编程门阵列(FPGA: Field Programmable Gate Array)并行计算的架构特性, 提出了基于 FPGA 的并行计算加速策略。该策略采用的具体方法包括: 合理分布片上内存与片下存储, 降低数据读取延迟; 采用多通道并行流水结构加速卷积操作; 通过卷积层数据共享减少访存延迟。利用 PYNQ-z2 开发平台加速卷积神经网络 YOLOv2, 最终实现目标物体的检测识别, 该设计的处理能力为27. 03 GOP/ s(Giga Operations Per Second, 10 亿次运算/ s), 与 CPU(E5-2620V4) 相比, 处理能力是 CPU 的 6. 57 倍, 功耗是 CPU 的 3% 。

关键词: 卷积神经网络; , 现场可编程门阵列; , 目标检测; , 硬件加速

Abstract: CNN ( Convolutional Neural Network) has large amount of computation, in order to achieve the purpose of fast data processing, hardware means are needed to accelerate. Based on the architecture characteristics of FPGA(Field Programmable Gate Array), a parallel computing acceleration strategy based on FPGA is proposed. The specific methods of this strategy include: reducing the data reading delay by reasonably distributing on-chip memory and off chip memory; accelerating the convolution operation by multi-channel parallel flow; reducing access delay by convolutional layer data sharing. PYNQ-Z2 development platform is used to accelerate the convolutional neural network YOLOv2 and achieve the detection and identification of the target object. The processing capacity of this design is 27.03 GOP / s, compared with CPU ( E5-2620v4 ), the processing capacity is 6. 57 times that of CPU and the power consumption is 3% of CPU.

Key words: convolutional neural network (CNN), field programmable gate array ( FPGA), target detection, hardware acceleration

中图分类号:

TP391

梁洪卫, 白鹏程, 陈建玲, 孙勤江, 陈明虎, 薛祥凯 . 基于 FPGA 的 YOLOv2 加速器设计[J]. 吉林大学学报(信息科学版), 2021, 39(4): 445-450.

LIANG Hongwei , BAI Pengcheng , CHEN Jianling , SUN Qinjiang , CHEN Minghu , XUE Xiangkai . Design of YOLOv2 Accelerator Based on FPGA[J]. Journal of Jilin University (Information Science Edition), 2021, 39(4): 445-450.

[1]	王宁, 苏皓, 王伟成, 陈明虎, 郭淞赫, 薛祥凯. 基于集成卷积小波极限学习的绝缘子故障检测[J]. 吉林大学学报(信息科学版), 2021, 39(5): 539-545.
[2]	文莉莉, 孙苗, 邬满. 基于 Faster R-CNN 的海域监管预警方法[J]. 吉林大学学报(信息科学版), 2021, 39(4): 421-429.
[3]	杨莉, 张亚楠, 王婷婷, 刘添翼. 基于改进 Faster R-CNN 的钢材表面缺陷检测方法[J]. 吉林大学学报(信息科学版), 2021, 39(4): 409-415.