基于现场可编程门电路的人脸检测识别加速平台

doi:10.13229/j.cnki.jdxbgxb20180480

吉林大学学报(工学版) ›› 2019, Vol. 49 ›› Issue (6): 2051-2057.doi: 10.13229/j.cnki.jdxbgxb20180480

基于现场可编程门电路的人脸检测识别加速平台

周柚^1,²(),杨森^1,²,李大琳^1,²,吴春国^1,²,王岩^1,²,王康平^1,²()

1. 吉林大学计算机科学与技术学院, 长春 130012
2. 吉林大学符号计算与知识工程教育部重点实验室, 长春 130012

收稿日期:2018-05-15 出版日期:2019-11-01 发布日期:2019-11-08
通讯作者: 王康平 E-mail:zyou@jlu.edu.cn;wangkp@jlu.edu.cn
作者简介:周柚（1979-），男，副教授，博士生导师. 研究方向：异构计算，模式识别. E-mail:zyou@jlu.edu.cn
基金资助:
国家自然科学基金项目(61772227);吉林省科技发展计划重点研发项目(20180201045GX);吉林省大数据智能计算重点实验室项目(20180622002JC)

Acceleration platform for face detection and recognition based on field⁃programmable gate array

You ZHOU^1,²(),Sen YANG^1,²,Da-lin LI^1,²,Chun-guo WU^1,²,Yan WANG^1,²,Kang-ping WANG^1,²()

1. College of Computer Science and Technology, Jilin University, Changchun 130012, China
2. Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China

Received:2018-05-15 Online:2019-11-01 Published:2019-11-08
Contact: Kang-ping WANG E-mail:zyou@jlu.edu.cn;wangkp@jlu.edu.cn

摘要/Abstract

摘要：

采用现场可编程门电路（FPGA）和中央处理器相结合的异构计算技术，解决人脸检测和识别计算加速问题。基于并发和流水线的方法加速Viola-Jones人脸检测算法，提高了数据吞吐量，增加了级联分类器的并行度；通过并发卷积操作和流水线特征图加速了卷积神经网络计算过程。实验结果表明，硬件平台较软件平台实现了2.9倍的加速比。

关键词: 计算机应用, 人脸检测, 人脸识别, 卷积神经网络, 现场可编程门电路算法, 算法硬件

Abstract:

A heterogeneous computing technique based on Field－Programmable Gate Array (FPGA) is proposed in this paper. The Viola-Jones face detection algorithm is accelerated based on concurrent and pipelining methods to improve data throughput and increase the parallelism of cascaded classifiers and convolution neural network is accelerated by concurrent convolution and pipelined feature maps. The experimental results show that the hardware platform achieves a speedup of 2.9 times compared with the software platform.

Key words: computer application, face detection, face recognition, convolution neural network, field programmable gate array（FPGA） algorithm, algorithm hardware

中图分类号:

TP338

周柚,杨森,李大琳,吴春国,王岩,王康平. 基于现场可编程门电路的人脸检测识别加速平台[J]. 吉林大学学报(工学版), 2019, 49(6): 2051-2057.

You ZHOU,Sen YANG,Da-lin LI,Chun-guo WU,Yan WANG,Kang-ping WANG. Acceleration platform for face detection and recognition based on field⁃programmable gate array[J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(6): 2051-2057.

图/表 16

图1

图2

图3

图4

图5

图6

图7

图8

图9

图10

表1

表2

表3

表4

表5

图11

参考文献

1	Viola P , Jones M J . Rapid object detection using a boosted cascade of simple features[C]∥IEEE Computer Society Conference on Computer Vision & Pattern Recognition, Kauai, USA, 2001:511-518.
2	Zhao W , Chellappa R , Phillips P J , et al . Face recognition: a literature survey[J]. ACM Computing Surveys,2003,35(4):399-458.
3	Tanaka Y , Maejima H . Loop optimization method for recurrences on a processor with instruction level parallelism[J]. IPSJ Journal, 1996, 37:1657-1665.
4	Brodtkorb A R , Hagen T R . A comparison of three commodity-level parallel architectures: multi-core CPU, Cell BE and GPU[J]. LNCS,2010, 5862:70-80.
5	Matell M S , Bateson M , Meck W H , et al . Single-trials analyses demonstrate that increases in clock speed contribute to the methamphetamine-induced horizontal shifts in peak-interval timing functions[J]. Psychopharmacology,2006,188(2):201-212.
6	Che S , Li J , Sheaffer J W , et al . Accelerating compute-intensive applications with GPUs and FPGAs[C]∥2008 Symposium on Application Specific Processors, Anaheim, USA, 2008: 101-107.
7	Zhou Yu-teng , Wang Wei , Huang Xin-ming . FPGA design for PCANet deep learning network[C]∥2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, Vancouver, Canada,2015:15305335.
8	Jin S , Cho J , Pham X D , et al . FPGA design and implementation of a real-time stereo vision system[J]. IEEE Transactions on Circuits & Systems for Video Technology, 2010,20(1):15-26.
9	Neshatpour K , Malik M , Ghodrat M A , et al . Accelerating big data analytics using fpgas[C]∥2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, Vancouver, Canada, 2015:15305376.
10	Srivastava N , Dai S , Manohar R , et al . Accelerating face detection on programmable SoC using C-based synthesis[C]∥ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA, 2017:195-200.
11	Zhou Yong-mei , Jiang Jing-fei . An FPGA-based accelerator implementation for deep convolutional neural networks[C]∥International Conference on Computer Science and Network Technology, Harbin, China,2016:829-832.
12	Zhang Chen , Li Peng , Sun Guang-yu , et al . Optimizing FPGA-based accelerator design for deep convolutional neural networks[C]∥Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA, 2015:161-170.
13	隋延林,何斌,张立国,等 . 基于FPGA的超高速CameraLink图像传输[J]. 吉林大学学报：工学版,2017，47(5):1634-1643.
13	Yan-lin Sui , He Bin , Zhang Li-guo , et al . Ultra-high speed CameraLink image transmission based on FPGA[J]. Journal of Jilin University (Engineering and Technology Edition), 2017,47(5):1634-1643.
14	陈大伟, 刘海龙, 李金屏 . 复杂静态背景下多移动目标实时检测系统的FPGA实现[J]. 吉林大学学报：工学版,2013，43(增刊1):287-290.
14	Chen Da-wei , Liu Hai-long , Li Jin-ping . FPGA implementation of real-time detection system of moving objects in complicated static background[J]. Journal of Jilin University (Engineering and Technology Edition), 2013,43(Sup.1):287-290.

相关文章 15

[1]	沈军,周晓,吉祖勤. 服务动态扩展网络及其结点系统模型的实现[J]. 吉林大学学报(工学版), 2019, 49(6): 2058-2068.
[2]	车翔玖,刘华罗,邵庆彬. 基于Fast RCNN改进的布匹瑕疵识别算法[J]. 吉林大学学报(工学版), 2019, 49(6): 2038-2044.
[3]	赵宏伟,王鹏,范丽丽,胡黄水,刘萍萍. 相似性保持实例检索方法[J]. 吉林大学学报(工学版), 2019, 49(6): 2045-2050.
[4]	周炳海,吴琼. 考虑工具和空间约束的机器人装配线平衡优化[J]. 吉林大学学报(工学版), 2019, 49(6): 2069-2075.
[5]	马子骥,卢浩,董艳茹. 双通道单图像超分辨率卷积神经网络[J]. 吉林大学学报(工学版), 2019, 49(6): 2089-2097.
[6]	李宾,周旭,梅芳,潘帅宁. 基于K-means和矩阵分解的位置推荐算法[J]. 吉林大学学报(工学版), 2019, 49(5): 1653-1660.
[7]	郭继昌,吴洁,郭春乐,朱明辉. 基于残差连接卷积神经网络的图像超分辨率重构[J]. 吉林大学学报(工学版), 2019, 49(5): 1726-1734.
[8]	孙延君,申铉京,陈海鹏,赵永哲. 基于局部平面线性点的翻拍图像鉴别算法[J]. 吉林大学学报(工学版), 2019, 49(4): 1320-1328.
[9]	李雄飞,宋璐,张小利. 基于协同经验小波变换的遥感图像融合[J]. 吉林大学学报(工学版), 2019, 49(4): 1307-1319.
[10]	翟凤文,党建武,王阳萍,金静,罗维薇. 基于扩展轮廓的快速仿射不变特征提取[J]. 吉林大学学报(工学版), 2019, 49(4): 1345-1356.
[11]	刘元宁,刘帅,朱晓冬,霍光,丁通,张阔,姜雪,郭书君,张齐贤. 基于决策粒子群优化与稳定纹理的虹膜二次识别[J]. 吉林大学学报(工学版), 2019, 49(4): 1329-1338.
[12]	李宾,申国君,孙庚,郑婷婷. 改进的鸡群优化算法[J]. 吉林大学学报(工学版), 2019, 49(4): 1339-1344.
[13]	王楠,李金宝,刘勇,张玉杰,钟颖莉. TPR⁃TF:基于张量分解的时间敏感兴趣点推荐模型[J]. 吉林大学学报(工学版), 2019, 49(3): 920-933.
[14]	刘富,宗宇轩,康冰,张益萌,林彩霞,赵宏伟. 基于优化纹理特征的手背静脉识别系统[J]. 吉林大学学报(工学版), 2018, 48(6): 1844-1850.
[15]	王利民,刘洋,孙铭会,李美慧. 基于Markov blanket的无约束型K阶贝叶斯集成分类模型[J]. 吉林大学学报(工学版), 2018, 48(6): 1851-1858.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

硬件名称	硬件型号
PC主机CPU型号	Inter(R) CPU i5 2310,2.9 GHz,4核
PC主机内存	8 GB
PC主机操作系统	Ubuntu16.04 64位
GPU	GEFORCE GTX 1060
FPGA型号	XILINX KCU105

软件名称	软件版本
Opencv	V3.4
Vivado HLS	V16.04
Vivado	V16.04
Keras	V2.4

层名称	输入/出尺寸	参数量
input(Input)	(None, 28, 28, 1)	0
conv2d_1(Conv2D)	(None, 26, 26, 8)	80
conv2d_2(Conv2D)	(None, 24, 24, 16)	1 168
Norm(Norm)	(None, 24, 24, 16)	0
mp2d_1(MP2d）	(None, 12, 12, 16)	0
conv2d_3(Conv2D)	(None, 10, 10, 16)	2 320
Dropout_1(Dropout)	(None, 10, 10, 16)	0
flatten_1(Flatten)	(None, 1600)	0
dense_1(Dense)	(None,32)	51 232
dropout_2(Dropout)	(None,32)	0
dense_2 (Dense)	(None,10)	330

逻辑资源	使用资源	总资源	使用率/%
LUT	195 044	242 400	76.39
REGISTER	352 370	484 800	64.43
DSP	376	1 920	14.95
BRAM	556	600	89.17

人脸数	CPU平台		CPU+FPGA平台		加速比/倍
人脸数	t/ms	f ₀/(帧·s^-1)	t/ms	f ₀/(帧·s^-1)	加速比/倍
1	125.62	7.96	43.08	23.81	2.991
2	129.19	7.74	44.07	22.69	2.931
4	136.79	7.31	45.55	21.95	3.002
8	142.24	7.03	48.01	20.83	2.963

基于现场可编程门电路的人脸检测识别加速平台

Acceleration platform for face detection and recognition based on field⁃programmable gate array

RICH HTML

PDF (PC)

摘要/Abstract

引用本文

使用本文

图/表 16

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 10