吉林大学学报(信息科学版) ›› 2025, Vol. 43 ›› Issue (4): 724-735.

• • 上一篇    下一篇

自适应深度一致与跨频注意力的多视立体网络 

邢 航,王 刚,王 岩,侯明辉   

  1. 吉林大学计算机科学与技术学院,长春130012
  • 收稿日期:2024-08-15 出版日期:2025-08-15 发布日期:2025-08-14
  • 通讯作者: 王刚(1981— ), 男, 长春人, 吉林大学教授, 博士生导师, 主要从事人工智能、机器人、无人驾驶研究,(Tel)86-18604465858(E-mail)gangwang@jlu. edu. cn。 E-mail:xh13605638511@ outlook. com
  • 作者简介:邢航(1999— ), 男, 安徽宣城人, 吉林大学硕士研究生,主要从事基于深度学习的三维重建研究,(Tel)86-13605638511 (E-mail)xh13605638511@ outlook. com
  • 基金资助:
    吉林省科技发展计划基金资助项目(3D813N801421) 

ADCFA-MVSNet: Multi-View Stereo with Adaptive Depth Consistency and Cross-Frequency Attention 

ING Hang, WANG Gang, WANG Yan, HOU Minghui   

  1. College of Computer Science and Technology, Jilin University, Changchun 130012, China
  • Received:2024-08-15 Online:2025-08-15 Published:2025-08-14

摘要: 针对当前深度学习在三维重建中难以从图像中提取全面的场景信息,以及未充分考虑视图间的深度一致性问题,提出具有自适应深度一致性和跨频注意力的多视图立体网络(ADCFA-MVSNet: Multi-View Stereo with Adaptive Depth Consistency and Cross-Frequency Attention)。 跨频注意力(CFA: Cross-Frequency Attention)模块 整合图像内高频和低频信息,以及跨视图的全局场景信息,能更加全面地提取图像特征。 自适应深度一致性 (AD: Adaptive Depth Consistency)模块精确捕捉场景中的几何结构, 动态考虑不同视图对深度一致性的贡献, 在不同尺度上增强深度一致性。其创新之处在于利用全面的图像信息,确保几何一致性,从而在3D重建任务 中取得优异的表现。实验结果表明,DTU(Technical University of Denmark)数据集上精确度为0.319, 完整度 为0.285, 整体得分为0.302, 优于其他对比方法。在 BlendedMVS 数据集上, EPE(End-Point-Error)得分为 0.27, e1 得分为5.28, e3 得分为1.84, 同样优于对比方法。证明了ADCFA-MVSNet在提升多视图三维重建的 完整度和精度方面的有效性,提高了多视图重建质量,取得了良好的重建效果。

关键词: 计算机视觉, 多视图立体视觉, 深度学习, 跨频注意力, 自适应深度一致性

Abstract: The current challenges in deep learning for 3D reconstruction are difficulty in extracting comprehensive scene information from images and insufficient consideration of depth consistency between views. A multi-view stereo network with adaptive depth consistency and cross-frequency attention (ADCFA-MVSNet: Multi-View Stereo with Adaptive Depth Consistency and Cross-Frequency Attention) is proposed. The CFA (Cross-Frequency Attention) module integrates high-frequency, low-frequency information within images and global scene information across views, enabling more comprehensive feature extraction. The AD(Adaptive Depth) consistency module precisely captures the geometric structure of the scene and dynamically considers the contribution of different views to depth consistency, enhancing it across various scales. The innovation of this method lies in utilizing comprehensive image information to ensure geometric consistency, achieving excellent performance in 3D reconstruction tasks. On the DTU(Technical University of Denmark) dataset, it achievs an accuracy of 0. 319, completeness of 0.285, and an overall score of 0.302, surpassing other methods. On the BlendedMVS dataset, the EPE(End-Point-Error) score is 0.27, e1 score is 5.28, and e3 score is 1.84, outperforming other methods. These results demonstrate the effectiveness of ADCFA-MVSNet in improving the completeness and accuracy of multi-view 3D reconstruction. Experimental results show that this method enhances the quality of multi-view reconstruction and achieves good reconstruction effects.

Key words: computer vision, multi-view stereo, deep learning, cross-frequency attention, adaptive depth consistency

中图分类号: 

  • TP391.4