吉林大学学报(信息科学版) ›› 2021, Vol. 39 ›› Issue (4): 445-450.

• • 上一篇    下一篇

基于 FPGA 的 YOLOv2 加速器设计

梁洪卫1 , 白鹏程1 , 陈建玲2 , 孙勤江2 , 陈明虎1 , 薛祥凯1 #br#   

  1. 1. 东北石油大学 电气信息工程学院, 黑龙江 大庆 163318; 2. 中海石油(中国)有限公司 天津分公司, 天津 300459
  • 收稿日期:2021-01-27 出版日期:2021-07-24 发布日期:2021-08-07
  • 作者简介:梁洪卫(1978— ), 男, 山东武城人, 东北石油大学副教授, 博士, 主要从事油气信息处理、 人工智能研究, ( Tel)86- 13936964989(E-mail)lianghongwei@ nepu. edu. cn
  • 基金资助:
    东北石油大学国家基金培育基金资助项目(2018GPYB-03)

Design of YOLOv2 Accelerator Based on FPGA

LIANG Hongwei1 , BAI Pengcheng1 , CHEN Jianling2 , SUN Qinjiang2 , CHEN Minghu1 , XUE Xiangkai1   

  1. 1. School of Electrical Engineering and Information, Northeast Petroleum University, Daqing 163318, China; 2. Tianjin Branch, CNOOC China Limited, Tianjin 300459, China
  • Received:2021-01-27 Online:2021-07-24 Published:2021-08-07

摘要: 卷积神经网络(CNN: Convolutional Neural Network)计算量较大, 为达到快速处理数据的目的, 需借助硬件手段进行加速。 因此, 利用现场可编程门阵列(FPGA: Field Programmable Gate Array)并行计算的架构特性, 提出了基于 FPGA 的并行计算加速策略。 该策略采用的具体方法包括: 合理分布片上内存与片下存储, 降低数据读取延迟; 采用多通道并行流水结构加速卷积操作; 通过卷积层数据共享减少访存延迟。 利用 PYNQ-z2 开发平台加速卷积神经网络 YOLOv2, 最终实现目标物体的检测识别, 该设计的处理能力为27. 03 GOP/ s(Giga Operations Per Second, 10 亿次运算/ s), 与 CPU(E5-2620V4) 相比, 处理能力是 CPU 的 6. 57 倍, 功耗是 CPU 的 3% 。

关键词: 卷积神经网络; , 现场可编程门阵列; , 目标检测; , 硬件加速

Abstract: CNN ( Convolutional Neural Network) has large amount of computation, in order to achieve the purpose of fast data processing, hardware means are needed to accelerate. Based on the architecture characteristics of FPGA(Field Programmable Gate Array), a parallel computing acceleration strategy based on FPGA is proposed. The specific methods of this strategy include: reducing the data reading delay by reasonably distributing on-chip memory and off chip memory; accelerating the convolution operation by multi-channel parallel flow; reducing access delay by convolutional layer data sharing. PYNQ-Z2 development platform is used to accelerate the convolutional neural network YOLOv2 and achieve the detection and identification of the target object. The processing capacity of this design is 27.03 GOP / s, compared with CPU ( E5-2620v4 ), the processing capacity is 6. 57 times that of CPU and the power consumption is 3% of CPU.

Key words: convolutional neural network (CNN), field programmable gate array ( FPGA), target detection, hardware acceleration

中图分类号: 

  • TP391