基于锚框变换的单阶段旋转目标检测方法

doi:10.13229/j.cnki.jdxbgxb20211217

摘要/Abstract

摘要：

为解决现有目标检测方法在检测无人机航拍图像中的交通目标时存在的水平包围框与目标真实轮廓贴合度较差、目标的水平包围框重叠度高导致相互抑制、目标发生旋转时，常规卷积操作的采样点落于目标之外等问题，在单阶段目标检测网络YOLOv3的基础上，提出了一种基于锚框变换的单阶段旋转目标检测网络（ATB-YOLO）。特征提取网络部分，设计了新的特征提取网络Darknet-53-Dense，使用Mish激活函数代替Leaky ReLU激活函数，并借鉴DenseNet网络使用拼接模块代替残差模块。针对检测头部网络，本文提出了一种锚框变换网络（ATN），将初始的水平锚框变换为旋转锚框；并提出锚框对齐卷积（AAC），在旋转锚框的指导下调整卷积操作的采样位置，提取锚框对齐特征图预测目标的旋转包围框和类别。实验证明，使用本文提出的特征提取网络进行检测，网络的检测精度提高了1.38%；本文提出的锚框对齐卷积AAC，相比常规卷积、可变卷积和锚框指导可变卷积检测精度分别提高了4.38%、4.24%和3.79%；与几种主流的旋转目标检测方法进行对比，本文方法在获得了与二阶段检测器相当的精度的同时，达到了21.2帧/s的准实时检测速度。

关键词: 计算机应用, 无人机航拍图像, 旋转目标检测, 深度学习, 特征对齐

Abstract:

Existing object detection methods have several problems when detecting traffic objects in uav aerial images， including the poor fittness between the horizontal bounding box and the rotated objects， incorrect suppression due to the high overlap between the bounding boxes， and the mismatch between the sampling points of the standard 2d convolution and the rotated objects.To solve these problems， a single-stage rotated object detection network called ATB-YOLO based on YOLOv3 was proposed. For the backbone of the network， a new feature extraction network called Darknet-53-Dense was designed. The Mish activation function was used to replace the Leaky ReLU activation function in Darknet-53， and the concatenated blocks were used to replace the residual blocks refering to the DenseNet. In the detection head， an Anchor Transformation Net （ATN） was proposed to transforms the initial horizontal anchors into rotated ones. An Anchor Aligned Convolution （AAC） was proposed to adjust the sampling position of convolution operation under the guidance of the rotated anchors. The extracted aligned features were then used to predict the final rotated bounding box and the category of the objects. Experimental results show that the proposed backbone improved the detection accuracy by 1.38%. The proposed AAC feature improved the accuracy by 4.38%， 4.24% and 3.79% respectively compared with the stantard convolution， the deformable convolution and the guided anchoring deformable convolution. Compared with several recent rotated object detection networks， the proposed method can do the detection at a framerate of 21.2 fps while achieving a competitive accuracy as the two-stage detector.

Key words: computer application, UAV aerial image, rotated object detection, deep learning, feature alignment

中图分类号:

TP391

曲优,李文辉. 基于锚框变换的单阶段旋转目标检测方法[J]. 吉林大学学报(工学版), 2022, 52(1): 162-173.

You QU,Wen-hui LI. Single-stage rotated object detection network based on anchor transformation[J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(1): 162-173.

图/表 13

图1

图2

图3

图 4

图 5

图6

图 7

表1

表2

图8

表3

表 4

图9

参考文献 23

1	Liu W, Anguelov D, Erhan D, et al. SSD: single shot multibox detector[C]∥European Conference on Computer Vision, Amsterdam, The Netherlands,2016: 21-37.
2	Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
3	Redmon J, Farhadi A. YOLOv3: an incremental improvement[J/OL].[2018-12-10]. .
4	李熙莹, 陆强, 张晓春, 等. 基于人车交互行为模型的上下客行为识别[J]. 中国公路学报, 2021, 34(7): 152-163.
	Li Xi-ying, Lu Qiang, Zhang Xiao-chun, et al. Identification of on-off passenger behavior based on human-vehicle interaction model[J]. China Journal of Highway and Transport, 2021, 34(7): 152-163.
5	金立生, 郭柏苍, 王芳荣, 等. 基于改进YOLOv3的车辆前方动态多目标检测算法[J]. 吉林大学学报:工学版, 2021,51(4): 1427-1436.
	Jin Li-sheng, Guo Bo-cang, Wang Fang-rong, et al. Vehicle forward dynamic multi-target detection algorithm based on improved YOLOv3[J]. Journal of Jilin University (Engineering and Technology Edition), 2021,51(4): 1427-1436.
6	姜迪, 刘慧, 李钰, 等.结合稠密特征映射的CT图像肿瘤分割模型[J]. 计算机辅助设计与图形学学报, 2021, 33(8): 1273-1286.
	Jiang Di, Liu Hui, Li Yu, et al. Tumor segmentation model for CT images combined with dense feature mapping[J]. Journal of Computer-Aided Design & Graphics, 2021, 33(8): 1273-1286.
7	于博文, 吕明. 改进的YOLOv3算法及其在军事目标检测中的应用[J/OL]. [2021-11-03]..
8	詹光莉,刘辉, 杨路. 基于改进注意力W-Net的工业烟尘图像分割[J/OL]. [2021-11-03]..
9	Liu M, Wang X, Zhou A, et al. UAV-YOLO: small object detection on unmanned aerial vehicle perspective[J]. Sensors, 2020, 20(8): 2238.
10	Chen L, Zhang Z, Peng L. Fast single shot multibox detector and its application on vehicle counting system[J]. IET Intelligent Transport Systems, 2018, 12(10): 1406-1413.
11	Zhu J, Sun K, Jia S, et al. Urban traffic density estimation based on ultrahigh-resolution UAV video and deep neural network[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2018, 11(12): 4968-4981.
12	Gao P, Tian J, Tai Y, et al. Vehicle detection with bottom wnhanced retinaNet in aerial images[C]∥IEEE International Geoscience and Remote Sensing Symposium, Waikoloa Village,USA,2020: 1173-1176.
13	Misra D. Mish: a self regularized non-monotonic neural activation function[J/OL].[2020-12-10]. , 2020.
14	Huang G, Liu Z, Maaten L V D, et al. Densely connected convolutional networks[C]∥IEEE Conference on Computer Vision and Pattern Recognition, Honolulu,USA,2017: 2261-2269.
15	Yang X, Yang J, Yan J, et al. SCRDet: towards more robust detection for small, cluttered and rotated objects[C]∥ IEEE/CVF International Conference on Computer Vision, Seoul, South Korea,2019: 8231-8240.
16	Qian W, Yang X, Peng S, et al. Learning modulated loss for rotated object detection[J/OL].[2019-10-24]. .
17	Dai J, Qi H, Xiong Y, et al. Deformable convolutional networks[C]∥IEEE International Conference on Computer Vision, Venice, Italy, 2017: 764-773.
18	Fan H, Du D, Wen L, et al. VisDrone-MOT2020: the vision meets drone multiple object tracking challenge results[C]∥European Conference on Computer Vision, Online, 2020: 713-727.
19	Yu H, Li G, Zhang W, et al. The unmanned aerial vehicle benchmark: object detection, tracking and baseline[J]. International Journal of Computer Vision, 2020, 128(5): 1141-1159.
20	Wang J, Chen K, Yang S, et al. Region proposal by guided anchoring[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2019: 2965-2974.
21	Xu Y, Fu M, Wang Q, et al. Gliding vertex on the horizontal bounding box for multi-oriented object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(4): 1452 - 1459.
22	Wang J, Yang W, Li H-C, et al. Learning center probability map for detecting objects in aerial images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(5): 4307-4323.
23	Yang X, Liu Q, Yan J, et al. R3Det: refined single-stage detector with feature refinement for rotating object[J/OL].[2020-10-27]. .

相关文章 15

[1]	刘洲洲,张倩昀,马新华,彭寒. 基于优化离散差分进化算法的压缩感知信号重构[J]. 吉林大学学报(工学版), 2021, 51(6): 2246-2252.
[2]	赵宏伟,霍东升,王洁,李晓宁. 基于显著性检测的害虫图像分类[J]. 吉林大学学报(工学版), 2021, 51(6): 2174-2181.
[3]	张杰,景雯,陈富. 基于被动分簇算法的即时通信网络协议漏洞检测[J]. 吉林大学学报(工学版), 2021, 51(6): 2253-2258.
[4]	孙东明,胡亮,邢永恒,王峰. 基于文本融合的物联网触发动作编程模式服务推荐方法[J]. 吉林大学学报(工学版), 2021, 51(6): 2182-2189.
[5]	王生生,陈境宇,卢奕南. 基于联邦学习和区块链的新冠肺炎胸部CT图像分割[J]. 吉林大学学报(工学版), 2021, 51(6): 2164-2173.
[6]	任丽莉,王志军,闫冬梅. 结合黏菌觅食行为的改进多元宇宙算法[J]. 吉林大学学报(工学版), 2021, 51(6): 2190-2197.
[7]	林俊聪,雷钧,陈萌,郭诗辉,高星,廖明宏. 基于电影视觉特性的动态多目标实时相机规划[J]. 吉林大学学报(工学版), 2021, 51(6): 2154-2163.
[8]	姚引娣,贺军瑾,李杨莉,谢荡远,李英. 自构建改进型鲸鱼优化BP神经网络的ET₀模拟计算[J]. 吉林大学学报(工学版), 2021, 51(5): 1798-1807.
[9]	董丽丽,杨丹,张翔. 基于深度学习的大规模语义文本重叠区域检索[J]. 吉林大学学报(工学版), 2021, 51(5): 1817-1822.
[10]	赵宏伟,张子健,李蛟,张媛,胡黄水,臧雪柏. 基于查询树的双向分段防碰撞算法[J]. 吉林大学学报(工学版), 2021, 51(5): 1830-1837.
[11]	张萌谡,刘春天,李希今,黄永平. 基于K⁃means聚类算法的绩效考核模糊综合评价系统设计[J]. 吉林大学学报(工学版), 2021, 51(5): 1851-1856.
[12]	曹洁,屈雪,李晓旭. 基于滑动特征向量的小样本图像分类方法[J]. 吉林大学学报(工学版), 2021, 51(5): 1785-1791.
[13]	孙小雪,钟辉,陈海鹏. 基于决策树分类技术的学生考试成绩统计分析系统[J]. 吉林大学学报(工学版), 2021, 51(5): 1866-1872.
[14]	金立生,郭柏苍,王芳荣,石健. 基于改进YOLOv3的车辆前方动态多目标检测算法[J]. 吉林大学学报(工学版), 2021, 51(4): 1427-1436.
[15]	王春波,底晓强. 基于标签分类的云数据完整性验证审计方案[J]. 吉林大学学报(工学版), 2021, 51(4): 1364-1369.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

	基准网络	对本文方法的不同设置
Darknet?D		√	√	√
ATN			√	√
AAC				√
mAP/%	84.17	85.33	86.41	90.79

方法	小型车辆检测精度 /%	大型车辆检测精度 /%	mAP /%	浮点运算次数/10¹¹
常规卷积	87.20	85.62	86.41	2.89
可变卷积	87.39	85.71	86.55	2.91
锚框指导可变卷积	87.85	85.78	86.82	2.90
锚框对齐卷积	90.24	91.35	90.79	2.91

	锚框变换网络深度（层）	检测头部网络深度（层）	mAP/%	浮点运算次数/10¹¹	参数量/10⁷
基准网络	-	-	72.33	2.43	3.55
本文方法	1	1	89.54	1.80	3.31
	2	2	90.79	2.90	3.63
	1	2	89.03	2.35	3.47
	2	1	89.25	2.35	3.47
	3	3	90.01	4.00	3.95

选用方法		锚框数量/个	mAP/%	FPS
二阶段方法	Gliding Vertex	20	90.14	10.0
二阶段方法	CenterMap?Net	15	91.39	6.6
单阶段方法	R3Det	21	88.19	18.5
单阶段方法	本文方法	1	90.79	21.2