基于多层级交互式特征融合的三维目标检测算法

吉林大学学报(理学版) ›› 2026, Vol. 64 ›› Issue (3): 591-0602.

基于多层级交互式特征融合的三维目标检测算法

高凯¹, 王晟宇¹, 付强², 才华¹, 张晨洁¹, 王伟刚³

1. 长春理工大学电子信息工程学院, 长春 130022；2. 长春理工大学空间光电技术研究所, 长春 130022；3. 吉林大学白求恩第一医院泌尿外二科, 长春 130061

收稿日期:2025-01-26 出版日期:2026-05-26 发布日期:2026-05-26
通讯作者: 才华 E-mail: caihua@cust.edu.cn

Three-Dimensional Object Detection Algorithm Based on Multi-level Interactive Feature Fusion

GAO Kai¹, WANG Shengyu¹, FU Qiang², CAI Hua¹, ZHANG Chenjie¹, WANG Weigang³

1. School of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, China；2. Institute of Opto-Electronic Engineering, Changchun University of Science and Technology, Changchun 130022, China；3. Department of Urology No.2, The First Bethune Hospital of Jilin University, Changchun 130061, China

Received:2025-01-26 Online:2026-05-26 Published:2026-05-26

摘要/Abstract

摘要： 针对自动驾驶场景中三维目标检测存在的小目标识别困难、远距离点云稀疏以及多模态特征融合不足等问题, 在多模态三维目标检测框架基础上提出一种改进算法. 该算法通过构建类别与质心感知的前景点采样策略, 增强前景信息保留能力并抑制背景噪声干扰；通过引入动态卷积图像特征提取机制, 提高图像特征表达质量；通过设计多阶段交互式特征注意力融合模块, 提升点云与图像特征的深层协同建模能力. 实验结果表明, 该算法在公开数据集上对汽车、行人和骑行者三类目标的平均检测精度分别达83.49%,46.98%和68.28%, 整体性能优于当前主流方法. 该方法能有效提升复杂交通场景下三维目标检测的准确性和鲁棒性, 对推动自动驾驶环境感知技术的发展有一定参考价值.

关键词: 计算机视觉, 关键点采样, 动态卷积, 三维目标检测, 多模态融合

Abstract: Aiming at the problems of small-object recognition difficulty, sparse point clouds at long distances, and insufficient multimodal feature fusion in three-dimensional object detection for autonomous driving scenarios, we proposed an improved algorithm based on a multimodal three-dimensional object detection framework. The algorithm enhanced the ability to preserve foreground information and suppress background noise interference by constructing a class and centroid-aware foreground point sampling strategy. By introducing a dynamic convolutional image feature extraction mechanism, the quality of image feature representation was improved. By designing a multi-stage interactive feature attention fusion module, the deep collaborative modeling ability between point cloud features and image features was improved. Experimental results on a public dataset show that the proposed method achieves average detection accuracies of 83.49%, 46.98% and 68.28% for three types of objects: cars, pedestrians and cyclists, respectively, and outperforms current mainstream methods in overall performance. The proposed method can effectively improve the accuracy and robustness of three-dimensional object detection in complex traffic scenarios and has certain reference value for promoting the development of autonomous driving environment perception technology.

Key words: computer vision, key point sampling, dynamic convolution, 3D object detection, multimodal fusion

中图分类号:

TP391

高凯, 王晟宇, 付强, 才华, 张晨洁, 王伟刚. 基于多层级交互式特征融合的三维目标检测算法[J]. 吉林大学学报(理学版), 2026, 64(3): 591-0602.

GAO Kai, WANG Shengyu, FU Qiang, CAI Hua, ZHANG Chenjie, WANG Weigang. Three-Dimensional Object Detection Algorithm Based on Multi-level Interactive Feature Fusion[J]. Journal of Jilin University Science Edition, 2026, 64(3): 591-0602.

[1]	张书达, 李慧盈. 融合欧氏与双曲几何的深度度量学习方法[J]. 吉林大学学报(理学版), 2026, 64(2): 284-0290.
[2]	江晟, 张仲义, 汪宗洋, 于晴. 基于改进YOLOv7的交通路口目标识别算法[J]. 吉林大学学报(理学版), 2024, 62(3): 665-673.
[3]	李伟伟, 王丽妍, 傅博, 王娟, 黄虹. 基于多模态融合的深度神经网络图像复原方法[J]. 吉林大学学报(理学版), 2024, 62(2): 391-0398.
[4]	王晓光, 管港云, 徐嘉铭, 李俊呈. 基于GPS和计算机图像识别的无人机导航系统[J]. 吉林大学学报(理学版), 2022, 60(4): 955-961.
[5]	朱新丽, 才华, 寇婷婷, 杜冬晖, 孙俊喜. 行人多目标跟踪算法[J]. 吉林大学学报(理学版), 2021, 59(5): 1161-1170.
[6]	杨帆, 张子文, 徐侃. 一种新型自适应嵌入式流形去噪视频运动目标分割算法[J]. 吉林大学学报(理学版), 2017, 55(05): 1213-1220.