吉林大学学报(工学版) ›› 2025, Vol. 55 ›› Issue (2): 709-721.doi: 10.13229/j.cnki.jdxbgxb.20230543

• 计算机科学与技术 • 上一篇    

基于多尺度候选融合与优化的三维目标检测算法

才华1(),郑延阳1,付强2,王晟宇1,王伟刚3,马智勇3   

  1. 1.长春理工大学 电子信息工程学院,长春 130022
    2.长春理工大学 空间光电技术研究所,长春 130022
    3.吉林大学第一医院 泌尿外二科,长春 130061
  • 收稿日期:2023-05-30 出版日期:2025-02-01 发布日期:2025-04-16
  • 作者简介:才华(1977-),男,副教授,博士.研究方向:机器学习,模式识别.E-mail:caihua@cust.edu.cn
  • 基金资助:
    国家自然科学基金重大项目(61890963);吉林省科技发展计划项目(20210204099YY);吉林省医疗卫生人才专项项目(JLSWSRCZX2023-70)

Three-dimensional object detection algorithm based on multi-scale candidate fusion and optimization

Hua CAI1(),Yan-yang ZHENG1,Qiang FU2,Sheng-yu WANG1,Wei-gang WANG3,Zhi-yong MA3   

  1. 1.School of Electronic Information Engineer,Changchun University of Science and Technology,Changchun 130022,China
    2.School of Opto-Electronic Engineer,Changchun University of Science and Technology,Changchun 130022,China
    3.No. 2 Department of Urology,The First Hospital of Jilin University,Changchun 130061,China
  • Received:2023-05-30 Online:2025-02-01 Published:2025-04-16

摘要:

为了改善点云场景下的检测任务中,基于单一低分辨特征图生成的候选框容易造成目标丢失和关键点采样过程中引入大量背景点的问题,本文提出了一种基于PV-RCNN网络的改进算法。通过区域候选融合网络和加权非极大值抑制融合不同尺度下的候选框并消除冗余。利用分割网络对原始点云进行前景点分割,并根据候选框确定目标中心点位置,利用高斯密度函数进行区域密度估计得到不同的采样权重以解决稀疏区域采样困难的问题。本文使用KITTI数据集进行实验评估,在汽车、行人和骑行者中等难度下的平均精度分别较基线算法提升0.39%、1.31%和0.63%,并同样在Waymo open数据集上进行泛化实验。实验结果证明本文算法与目前大部分三维目标检测算法相比取得更高的检测精度。

关键词: 计算机视觉, 三维目标检测, 区域候选融合, 加权非极大值抑制, 关键点采样

Abstract:

To address the issues of target omission and the inclusion of a large number of background points in keypoint sampling for point cloud-based object detection, an improved algorithm based on the PV-RCNN network is introduced. This approach employs both a regional proposal fusion network and weighted non-maximum suppression (NMS) to merge proposals generated at various scales while eliminating redundancy. A segmentation network is utilized to segment foreground points from the original point cloud, and object center points are identified based on these proposals. Gaussian density functions are employed for regional density estimation, which assigns different sampling weights to solve the problem of difficult sampling in sparse areas. Experimental evaluations on the KITTI dataset indicate that the algorithm enhances the average precision at medium difficulty levels by 0.39%, 1.31%, and 0.63% for cars, pedestrians, and cyclists, respectively. Generalization experiments were also conducted on the Waymo open dataset. The results suggest that the introduced algorithm achieves higher accuracy compared to most of the existing 3D object detection networks.

Key words: computer version, 3D object detection, region proposal fusion, weighted non-maximum suppression, keypoint sampling

中图分类号: 

  • TP391

图1

系统结构图"

图2

加权非极大值抑制流程图"

图3

前景点分割网络"

图4

基于中心点密度关键点采样示意图"

表1

KITTI数据集检测任务中3个难度等级划分标准"

等 级简单中等困难
最小边界框高度40像素25像素25像素
最大遮挡等级完全可见部分遮挡难以看清
最大截断率15%30%50%

表2

KITTI测试集不同方法实现的定量检测性能对比结果"

算法类型3D 车辆 (IoU=0.7)/%3D行人 (IoU=0.5)/%3D 骑行者 (IoU=0.5)/%
简单中等困难简单中等困难简单中等困难
VoxelNet11一阶段77.4765.1157.7339.4833.6931.5061.2248.3644.37
SECOND12一阶段84.6575.9668.7145.3135.5233.1475.8360.8253.67
PointPillars29一阶段82.5874.3168.9951.4541.9238.8977.1058.6551.92
Point-GNN30一阶段88.3379.4772.2951.9243.7740.1478.6063.4857.08
IAvSSD27一阶段88.3480.1375.0446.5139.0335.6178.3561.9455.70
TANet31二阶段84.3975.9468.8253.7244.3440.4975.7059.4452.53
Part-A2[32二阶段87.8178.4973.5153.1043.3540.0679.1763.5256.93
PointRCNN21二阶段86.9675.6470.7047.9839.3736.0174.9658.8252.53
STD16二阶段87.9579.7175.0953.2942.4738.3578.6961.5955.30
PV-RCNN17二阶段90.2581.4376.8252.1743.2940.2978.6063.7157.65
本文二阶段90.3281.8277.1153.8644.6040.8479.5264.3458.03

表3

KITTI验证集上R11标准下不同算法3D mAP结果"

算法年份mAP3D/%
简单中等困难
SECOND12201888.6178.6277.22
PointPillars29201986.6276.0668.91
STD16201989.7079.8079.30
PointRCNN21201988.8878.6377.38
Part-A2 [30202089.4779.4778.54
PV-RCNN17202089.3583.6978.70
VoxelRCNN33202189.4184.5278.93
JPV-Net34202289.7184.6179.09
本文202590.5484.7279.81

表4

KITTI数据集上不同算法运行时间对比"

算法VoxelNet11Point-RCNN21PV-RCNN17STD16EQ-PVRCNN37Ours
运行时间/ms2201008080200140

表5

Waymo Open数据集验证集的定量检测性能对比结果"

算法车辆(mAP/mAPH)/%行人(mAP/mAPH)/%骑行者(mAP/mAPH)/%
LEVEL 1LEVEL 2LEVEL 1LEVEL 2LEVEL 1LEVEL 2
SECOND1272.3/71.763.9/63.368.7/58.260.7/51.360.6/59.358.3/57.0
PointPillars2972.1/71.563.6/63.170.6/56.762.8/50.364.4/62.361.9/59.9
Centerpoint3676.6/76.068.9/68.479.0/73.471.0/65.872.1/71.069.5/68.5
IA-SSD2770.5/69.761.6/60.969.4/58.560.3/50.767.7/65.365.0/62.7
CenterFormer3775.2/74.770.2/69.778.6/73.073.6/68.372.3/71.369.8/68.8
VoxSeT3874.5/74.066.0/65.680.0/72.472.5/65.471.6/70.369.0/67.8
PV-RCNN1777.5/76.969.0/68.475.0/65.666.0/57.667.8/66.465.4/64.0
本文78.5/78.172.5/71.880.1/75.373.8/67.874.3/73.670.9/68.7

表6

Waymo open数据集上对车辆在不同范围内检测结果进行比较"

模型年份车辆 3D mAP (IoU=0.7)
0~30 m30~50 m50 m-Inf
PV-RCNN17202091.9269.2142.17
Voxel-RCNN33202192.4974.0953.15
CT3D39202192.5175.0755.36
VoxSeT38202291.1375.7554.23
本文202592.1075.8155.42

图5

KITTI数据集上算法检测可视化示例图"

图6

Waymo open数据集上算法检测可视化示例图"

图7

KITTI数据集上远处稀疏区域本文算法与基线算法检测效果对比图"

图8

在Waymo open数据集上本文算法与当前先进算法的可视化对比"

表8

消融实验对比结果"

算法区域候选融合加权NMS区域密度采样mAP(中等)/%
车辆行人骑行者
基础网络(PV-RCNN)81.4343.2963.71
实验网络181.6044.2464.05
实验网络281.7644.3564.09
本文算法81.8244.7064.34
1 Qian R, Garg D, Wang Y, et al. End-to-end pseudo-lidar for image-based 3d object detection[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 5881-5890.
2 Wang Z, Huang Z, Fu J, et al. Object as query: equipping any 2D object detector with 3D detection ability[J]. Arxiv Preprint, 2023, 1: No.230102364.
3 陶博, 颜伏伍, 尹智帅, 等. 基于高精度地图增强的三维目标检测算法[J]. 吉林大学学报: 工学版, 2023, 53(3): 802-809.
Tao Bo, Yan Fu-wu, Yin Zhi-shuai, et al. 3D object detection algorithm based on high-precision map enhancement[J]. Journal of Jilin University (Engineering and Technology Edition), 2023, 53(3): 802-809.
4 Yang Z, Zhou Y, Chen Z, et al. 3D-man: 3D multi-frame attention network for object detection[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Montreal, Canada, 2021: 1863-1872
5 Li Y, Yu A W, Meng T, et al. Deepfusion: lidar-camera deep fusion for multi-modal 3d object detection[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 17182-17191.
6 才华, 寇婷婷, 杨依宁, 等. 基于轨迹优化的三维车辆多目标跟踪[J]. 吉林大学学报: 工学版, 2024, 54(8): 2338-2347.
Cai Hua, Kou Ting-ting, Yang Yi-ning, et al. Three-dimensional vehicle multiple target tracking based on trajectory optimization[J]. Journal of Jilin University (Engineering and Technology Edition), 2024, 54(8): 2338-2347.
7 Zheng A, Zhang Y, Zhang X, et al. Progressive end-to-end object detection in crowded scenes[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 857-866.
8 Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]∥ 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 779-788.
9 Waleed A, Sherif A, Mahmoud Z, et al. Yolo3D: end-to-end real-time 3D oriented object bounding box detection from lidar point cloud[C]∥Computer Vision-ECCV 2018 Workshops, Munichi, Germany, 2018: 716-728.
10 Zhou Y, Sun P, Zhang Y, et al. End-to-end multi-view fusion for 3D object detection in lidar point clouds[C]∥Proceedings of the Conference on Robot Learning, Cambridge, USA, 2020: 923–932.
11 Zhou Y, Tuzel O. Voxelnet: end-to-end learning for point cloud based 3d object detection[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 4490-4499.
12 Yan Y, Mao Y, Li B. Second: sparsely embedded convolutional detection[J]. Sensors, 2018, 18(10): No.18103337.
13 Qi C R, Su H, Mo K, et al. Pointnet: deep learning on point sets for 3d classification and segmentation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 652-660.
14 Qi C R, Yi L, Su H, et al. Pointnet++: deep hierarchical feature learning on point sets in a metric space[C]∥Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 5105-5114.
15 Qi C R, Liu W, Wu C, et al. Frustum pointnets for 3d object detection from rgb-d data[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 918-927.
16 Yang Z, Sun Y, Liu S, et al. Std: sparse-to-dense 3d object detector for point cloud[C]∥Proceedings of The IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, 2019: 1951-1960.
17 Shi S, Guo C, Jiang L, et al. PV-RCNN: point-voxel feature set abstraction for 3d object detection[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 10529-10538.
18 Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? the kitti vision benchmark suite[C]∥2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, Rhode Island,USA, 2012: 3354-3361.
19 Sun P, Kretzschmar H, Dotiwalla X, et al. Scalability in perception for autonomous driving: Waymo open dataset[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 2446-2454.
20 Ye M, Xu S, Cao T. Hvnet: hybrid voxel network for lidar based 3D object detection[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 1631-1640.
21 Shi S, Wang X, Li H. Pointrcnn: 3D object proposal generation and detection from point cloud[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seoul, South Korea, 2019: 770-779.
22 Liu Z, Tang H, Lin Y, et al. Point-voxel CNN for efficient 3D deep learning[C]∥Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019: 965-975.
23 田枫, 姜文文, 刘芳, 等. 混合体素与原始点云的三维目标检测方法[J]. 重庆理工大学学报: 自然科学, 2022, 36(11): 108-117.
Tian Feng, Jiang Wen-wen, Liu Fang, et al. Hybrid element and original point cloud 3D target detection method [J]. Journal of Chongqing University of Technology (Natural Science), 2022, 36(11):108-117.
24 Shi S, Jiang L, Deng J, et al. PV-RCNN++: point-voxel feature set abstraction with local vector representation for 3D object detection[J]. International Journal of Computer Vision, 2023, 131(2): 531-551.
25 车运龙, 袁亮, 孙丽慧. 基于强语义关键点采样的三维目标检测方法[J]. 计算机工程与应用, 2024, 60(9): 254-260.
Che Yun-long, Yuan Liang, Sun Li-hui, et al. 3D object detection method based on strong semantic key point sampling[J]. Computer Engineering and Applications, 2024, 60(9): 254-260.
26 He C, Zeng H, Huang J, et al. Structure aware single-stage 3D object detection from point cloud[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 11873-11882.
27 Zhang Y, Hu Q, Xu G, et al. Not all points are equal: learning highly efficient point-based detectors for 3D lidar point clouds[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 18953-18962.
28 Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]∥Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2980-2988.
29 Lang A H, Vora S, Caesar H, et al. Pointpillars: fast encoders for object detection from point clouds [C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seoul, South Korea, 2019: 12697-12705.
30 Shi W, Rajkumar R. Point-GNN: graph neural network for 3d object detection in a point cloud[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 1711-1719.
31 Liu Z, Zhao X, Huang T, et al. Tanet: robust 3D object detection from point clouds with triple attention[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 11677-11684.
32 Shi S, Wang Z, Shi J, et al. From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(8): 2647-2664.
33 Deng J, Shi S, Li P, et al. Voxel R-Cnn: towards high performance voxel-based 3d object detection[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(2): 1201-1209.
34 Song N, Jiang T, Yao J. JPV-Net: joint point-voxel representations for accurate 3D object detection[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(2): 2271-2279.
35 Yang Z, Jiang L, Sun Y, et al. A unified query-based paradigm for point cloud understanding[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 8541-8551.
36 Yin T, Zhou X, Krahenbuhl P. Center-based 3D object detection and tracking[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 11784-11793.
37 Zhou Z, Zhao X, Wang Y, et al. Centerformer: center-based transformer for 3D object detection[C]∥European Conference on Computer Vision, Tel Aviv, Israel, 2022: 496-513.
38 He C, Li R, Li S, et al. Voxel set transformer: a set-to-set approach to 3D object detection from point clouds[C]∥2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 8417-8427.
39 Sheng H, Cai S, Liu Y, et al. Improving 3D object detection with channel-wise transformer[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 2743-2752.
[1] 才华,寇婷婷,杨依宁,马智勇,王伟刚,孙俊喜. 基于轨迹优化的三维车辆多目标跟踪[J]. 吉林大学学报(工学版), 2024, 54(8): 2338-2347.
[2] 朱圣杰,王宣,徐芳,彭佳琦,王远超. 机载广域遥感图像的尺度归一化目标检测方法[J]. 吉林大学学报(工学版), 2024, 54(8): 2329-2337.
[3] 孙铭会,薛浩,金玉波,曲卫东,秦贵和. 联合时空注意力的视频显著性预测[J]. 吉林大学学报(工学版), 2024, 54(6): 1767-1776.
[4] 王殿伟,张池,房杰,许志杰. 基于高分辨率孪生网络的无人机目标跟踪算法[J]. 吉林大学学报(工学版), 2024, 54(5): 1426-1434.
[5] 王宇,赵凯. 基于亚像素定位的人体姿态热图后处理[J]. 吉林大学学报(工学版), 2024, 54(5): 1385-1392.
[6] 高云龙,任明,吴川,高文. 基于注意力机制改进的无锚框舰船检测模型[J]. 吉林大学学报(工学版), 2024, 54(5): 1407-1416.
[7] 程鑫,刘升贤,周经美,周洲,赵祥模. 融合密集连接和高斯距离的三维目标检测算法[J]. 吉林大学学报(工学版), 2024, 54(12): 3589-3600.
[8] 孙文财,胡旭歌,杨志发,孟繁雨,孙微. 融合GPNet与图像多尺度特性的红外-可见光道路目标检测优化方法[J]. 吉林大学学报(工学版), 2024, 54(10): 2799-2806.
[9] 陶博,颜伏伍,尹智帅,武冬梅. 基于高精度地图增强的三维目标检测算法[J]. 吉林大学学报(工学版), 2023, 53(3): 802-809.
[10] 刘晶红,邓安平,陈琪琪,彭佳琦,左羽佳. 基于多重注意力机制的无锚框目标跟踪算法[J]. 吉林大学学报(工学版), 2023, 53(12): 3518-3528.
[11] 王侃,苏航,曾浩,覃剑. 表观增强的深度目标跟踪算法[J]. 吉林大学学报(工学版), 2022, 52(11): 2676-2684.
[12] 曹洁,屈雪,李晓旭. 基于滑动特征向量的小样本图像分类方法[J]. 吉林大学学报(工学版), 2021, 51(5): 1785-1791.
[13] 徐涛,马克,刘才华. 基于深度学习的行人多目标跟踪方法[J]. 吉林大学学报(工学版), 2021, 51(1): 27-38.
[14] 赵宏伟,李明昭,刘静,胡黄水,王丹,臧雪柏. 基于自然性和视觉特征通道的场景分类[J]. 吉林大学学报(工学版), 2019, 49(5): 1668-1675.
[15] 车翔玖, 王利, 郭晓新. 基于多尺度特征融合的边界检测算法[J]. 吉林大学学报(工学版), 2018, 48(5): 1621-1628.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!