吉林大学学报(理学版) ›› 2026, Vol. 64 ›› Issue (3): 591-0602.

• • 上一篇    下一篇

 基于多层级交互式特征融合的三维目标检测算法

高凯1, 王晟宇1, 付强2, 才华1, 张晨洁1, 王伟刚3   

  1. 1. 长春理工大学 电子信息工程学院, 长春 130022;2. 长春理工大学 空间光电技术研究所, 长春 130022;3. 吉林大学白求恩第一医院 泌尿外二科, 长春 130061
  • 收稿日期:2025-01-26 出版日期:2026-05-26 发布日期:2026-05-26
  • 通讯作者: 才华 E-mail: caihua@cust.edu.cn

Three-Dimensional Object Detection Algorithm Based on Multi-level Interactive Feature Fusion

GAO Kai1, WANG Shengyu1, FU Qiang2, CAI Hua1, ZHANG Chenjie1, WANG Weigang3   

  1. 1. School of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, China;2. Institute of Opto-Electronic Engineering, Changchun University of Science and Technology, Changchun 130022, China;3. Department of Urology No.2, The First Bethune Hospital of Jilin University, Changchun 130061, China
  • Received:2025-01-26 Online:2026-05-26 Published:2026-05-26

摘要: 针对自动驾驶场景中三维目标检测存在的小目标识别困难、 远距离点云稀疏以及多模态特征融合不足等问题, 在多模态三维目标检测框架基础上提出一种改进算法. 该算法通过构建类别与质心感知的前景点采样策略, 增强前景信息保留能力并抑制背景噪声干扰; 通过引入动态卷积图像特征提取机制, 提高图像特征表达质量; 通过设计多阶段交互式特征注意力融合模块, 提升点云与图像特征的深层协同建模能力. 实验结果表明, 该算法在公开数据集上对汽车、 行人和骑行者三类目标的平均检测精度分别达83.49%,46.98%和68.28%, 整体性能优于当前主流方法. 该方法能有效提升复杂交通场景下三维目标检测的准确性和鲁棒性, 对推动自动驾驶环境感知技术的发展有一定参考价值.

关键词: 计算机视觉, 关键点采样, 动态卷积, 三维目标检测, 多模态融合

Abstract: Aiming at the problems of small-object recognition difficulty, sparse point clouds at long distances, and insufficient multimodal feature fusion in three-dimensional object detection for autonomous driving scenarios, we proposed an improved algorithm based on a multimodal three-dimensional object detection framework. The algorithm enhanced the ability  to preserve foreground information and suppress background noise interference by constructing a class and centroid-aware foreground point sampling strategy. By introducing  a dynamic convolutional image feature extraction mechanism,  the quality of image feature representation was improved. By designing a multi-stage interactive feature attention fusion module,  the deep collaborative modeling ability between point cloud features and image features was improved. Experimental results on a public dataset show that the proposed method achieves average detection accuracies of 83.49%, 46.98% and 68.28% for three types of objects: cars, pedestrians and cyclists, respectively, and outperforms current mainstream methods in overall performance. The proposed method can effectively improve the accuracy and robustness of three-dimensional object detection in complex traffic scenarios and has certain reference value  for promoting the development of autonomous driving environment perception technology.

Key words: computer vision, key point sampling, dynamic convolution, 3D object detection, multimodal fusion

中图分类号: 

  • TP391