吉林大学学报(工学版) ›› 2025, Vol. 55 ›› Issue (5): 1682-1691.doi: 10.13229/j.cnki.jdxbgxb.20230820

• 计算机科学与技术 • 上一篇    下一篇

基于改进密集网络和小波分解的自监督单目深度估计

程德强1(),王伟臣1,韩成功1,吕晨1,寇旗旗2()   

  1. 1.中国矿业大学 信息与控制工程学院,江苏 徐州 221116
    2.中国矿业大学 计算机科学与技术学院,江苏 徐州 221116
  • 收稿日期:2023-08-04 出版日期:2025-05-01 发布日期:2025-07-18
  • 通讯作者: 寇旗旗 E-mail:chengdq@cumt.edu.cn;kouqiqi@cumt.edu.cn
  • 作者简介:程德强(1979-),男,教授,博士.研究方向:机器视觉与模式识别,图像智能检测与信息处理.E-mail: chengdq@cumt.edu.cn
  • 基金资助:
    国家自然科学基金项目(52204177);国家自然科学基金项目(52304182);中央高校基本科研业务费专项资金项目(2020QN49)

Self-supervised monocular depth estimation based on improved densenet and wavelet decomposition

De-qiang CHENG1(),Wei-chen WANG1,Cheng-gong HAN1,Chen LYU1,Qi-qi KOU2()   

  1. 1.School of Information and Control Engineering,China University of Mining and Technology,Xuzhou 221116,China
    2.School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,221116 China
  • Received:2023-08-04 Online:2025-05-01 Published:2025-07-18
  • Contact: Qi-qi KOU E-mail:chengdq@cumt.edu.cn;kouqiqi@cumt.edu.cn

摘要:

针对传统自监督单目深度估计模型对浅层特征的提取和融合不充分,容易导致小物体漏检、物体边缘模糊等问题,本文提出了一种基于改进密集网络和小波分解的自监督单目深度估计模型。该模型沿用了U-Net结构,其中,编码器采用改进的密集网络,提高了编码器的特征提取和融合能力;跳跃连接中加入细节增强模块,对编码器输出的多尺度特征进一步细化整合;解码器引入小波分解,迫使解码器更加关注高频信息,实现对图像边缘的精细化处理。实验结果表明,本文提出的深度估计模型对小物体特征的捕捉能力更强,生成的深度图边缘更清晰准确。

关键词: 信号与信息处理, 深度估计, 自监督, 密集网络, 小波分解, 细节增强

Abstract:

The traditional self-supervised monocular depth estimation model has limitations in extracting and fusing shallow features, leading to issues such as omission detection of small objects and blurring of object edges. To address these problems, a self-supervised monocular depth estimation model based on improved dense network and wavelet decomposition is proposed in this paper. The whole framework of the model follows the structure of U-net, in which the encoder adopts the improved densenet to improve the ability of feature extraction and fusion. A detail enhancement module is introduced in the skipping connections to further refine and integrate the multi-scale features generated by the encoder. The decoder incorporates wavelet decomposition, enabling better focus on high-frequency information during decoding to achieve precise edge refinement. Experimental results demonstrate that our method exhibits stronger capability in capturing depth estimation for small objects, resulting in clearer and more accurate edges in the generated depth map.

Key words: signal and information processing, depth estimation, self-supervision, densenet, wavelet decomposition, detail enhancement

中图分类号: 

  • TP391.41

图1

本文深度估计网络架构"

表1

编码器网络结构参数"

尺度DenseNet121-DDenseNet169-D
卷积S/27×7 conv, 64, stride 2

Dense block

(1)

S/21×1conv3×3conv×6, stirde 11×1conv3×3conv×6, stirde 1

Transition layer

(1)

S/4

1×1 conv, 128, stride 1

2×2 平均池化, stride 2

Dense block

(2)

S/41×1conv3×3conv×12, stirde 11×1conv3×3conv×12, stirde 1

Transition layer

(2)

S/8

1×1 conv, 256, stride 1

2×2 平均池化, stride 2

Dense block

(3)

S/81×1conv3×3conv×24, stride 11×1conv3×3conv×32, stride 1

Transition layer

(3)

S/16

1×1 conv, 512, stride 1

2×2 平均池化, stride 2

1×1 conv, 640, stride 1

2×2 平均池化, stride 2

Dense block

(4)

S/16

1×1conv3×3conv×16, stirde 1,

1 024

1×1conv3×3conv×32, stirde 1,

1 664

Downsample layerS/32

Batch norm, ReLU,

2×2 平均池化, stride 2

图2

DEM结构"

图3

逆离散小波变换"

图4

自监督深度估计框架"

表2

KITTI数据集上的测试结果"

方法监督方式误差准确度
AbsRelSqRelRMSERMSElogδ<1.25δ<1.252δ<1.253
Garg6S0.1521.2265.8490.2460.7840.9210.967
Monodepth R5013S0.1331.1425.5330.2300.8300.9360.970
StrAT28S0.1281.0195.4030.2270.8270.9350.971
3Net R5029S0.1290.9965.2810.2230.8310.9390.974
3Net VGG29S0.1191.2015.8880.2080.8440.9410.978
SuperDepth30S0.1120.8754.9580.2070.8520.9470.977
VA-Depth16M0.1120.8644.8040.1900.8770.9590.982
Zeeshan17M0.1130.9034.8630.1930.8770.9590.981
STDepthFormer18M0.1100.8054.6780.1870.8780.9610.983
Monodepth28S0.1090.8734.9600.2090.8640.9480.975
WaveletMonodepth21(基线)S0.1100.8764.9160.2060.8640.9500.976
本文S0.1030.8014.7270.2010.8770.9530.976
Monodepth2(HD)8S0.1070.8494.7640.2010.8740.9530.977
WaveletMonodepth(HD)21S0.1050.7974.7320.2030.8690.9520.977
本文(HD)S0.0970.7264.5310.1950.8840.9550.978
Monodepth28MS0.1060.8184.7500.1960.8740.9570.979
WaveletMonodepth21MS0.1090.8144.8080.1980.8680.9550.980
本文MS0.1000.7314.5360.1900.8820.9590.980

表3

CityScapes数据集上的迁移测试结果"

方法AbsRelSqRelRMSERMSElog
Monodepth R50130.2102.2309.4300.311
Monodepth280.1821.8808.8700.253
WaveletMonodepth210.1851.9508.6590.248
本文0.1751.8318.5900.242

图5

KITTI数据集的可视化结果对比"

图6

CityScapes数据集的可视化结果对比"

表4

本文模块消融实验"

实验Densenet-DDEMWaveletAbsRelSqRelRMSERMSElogδ<1.25δ<1.252δ<1.253
1×××0.1080.8424.8910.2070.8650.9490.976
2××0.1100.8764.9160.2060.8640.9500.976
3×0.1070.8654.8910.2040.8670.9500.976
4×0.1050.8104.7390.2020.8760.9520.976
50.1030.8014.7270.2010.8770.9530.976

表5

本文编码器设计消融实验"

实验编码器AbsRelSqRelRMSERMSElogδ<1.25δ<1.252δ<1.253
1DenseNet1210.1060.8624.8820.2030.8720.9510.976
2DenseNet1690.1080.8924.9720.2050.8710.9500.976
3DenseNet121-D0.1030.8014.7270.2010.8770.9530.976
4DenseNet169-D0.1020.7974.7450.2000.8790.9530.976
[1] 王新竹, 李骏, 李红建, 等. 基于三维激光雷达和深度图像的自动驾驶汽车障碍物检测方法[J]. 吉林大学学报 (工学版), 2016, 46(2): 360-365.
Wang Xin-zhu, Li Jun, Li Hong-jian, et al. Obstacle detection based on 3D laser scanner and range image for intelligent vehicle[J]. Journal of Jilin University (Engineering and Technology Edition), 2016, 46(2): 360-365.
[2] 张宇翔, 任爽. 定位技术在虚拟现实中的应用综述[J]. 计算机科学, 2021, 48(1): 308-318.
Zhang Yu-xiang, Ren Shuang. Overview of the application of location technology in virtual reality[J]. Computer Science, 2021,48 (1): 308-318.
[3] 史晓刚, 薛正辉, 李会会, 等. 增强现实显示技术综述[J]. 中国光学, 2021, 14(5): 1146-1161.
Shi Xiao-gang, Xue Zheng-hui, Li Hui-hui, et al. Overview of augmented reality display technology [J]. China Optics, 2021, 14 (5): 1146-1161.
[4] Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture[C]∥2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015: 2650-2658.
[5] Fu H, Gong M, Wang C, et al. Deep ordinal regression network for monocular depth estimation[C]∥ 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 2002-2011.
[6] Garg R, Vijay K B G, Carneiro G, et al. Unsupervised CNN for single view depth estimation: geomery to the rescue[C]∥European Conference Computer Vision, Amsterdam, Netherlands, 2016: 740-756.
[7] Zhou T H, Brown M, Snavely N, et al. Unsupervised learning of depth and ego-motion from video[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, USA, 2017: 1851-1858.
[8] Clément G, Oisin M A, Michael F, et al. Digging into self-supervised monocular depth estimation[C]∥ 2015 IEEE International Conference on Computer Vision (ICCV), Seoul, South Korea, 2019: 3828-3838.
[9] Ashutosh S, Sun M, Andrew Y N. Make3D:learning 3D scene structure from a single still image[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 31(5): 824-840.
[10] Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network[C]∥Advances in Neural Information Processing Systems, Montreal, Canada, 2014: 2366-2374.
[11] Zachary T, Jia D. Deepv2D: video to depth with differentiable structure from motion[C]∥International Conference on Learning Representations (ICLR) 2020, Addis Ababa, Ethiopian, 2020: 181204605.
[12] Benjamin U, Zhou H Z, Jonas U, et al. Demon: depth and motion network for learning monocular stereo[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, USA, 2017: 5038-5047.
[13] Clément G, Oisin M A, Gabriel J B. Unsupervised monocular depth estimation with left-right consistency[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, USA, 2017: 270-279.
[14] Bian J W, Li Z C, Wang N, et al. Unsupervised scale-consistent depth and ego-motion learning from monocular video[C]∥33rd Conference on Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 2019: 1-12.
[15] Han C, Cheng D, Kou Q, et al. Self-supervised monocular depth estimation with multi-scale structure similarity loss[J]. Multimedia Tools and Applications, 2022, 31: 3251-3266.
[16] Xiang J, Wang Y, An L,et al. Visual attention-based self-supervised absolute depth estimation using geometric priors in autonomous driving[J/OL].(2022-10-06)[2023-06-13]..
[17] Suri Z K. Pose constraints for consistent self-supervised monocular depth and ego-motion[J/OL].(2023-04-18)[2023-06-13]..
[18] Houssem B, Adrian V, Andrew C. STDepthFormer: predicting spatio-temporal depth from video with a self-supervised transformer model[C]∥Detroit, USA, 2023: No.230301196.
[19] Matteo P, Filippo A, Fabio T, et al. Towards real-time unsupervised monocular depth estimation on CPU[C]∥2018 IEEE/RSJ international Conference Intelligent Robots and Systems (IROS), Madrid, Spain, 2018: 5848-5854.
[20] Diana W, Ma F C, Yang T J, et al. FastDepth: fast monocular depth estimation on embedded systems[C]∥2019 International Conference on Robotics and Automation (ICRA), Montreal, Canada, 2019: 6101-6108.
[21] Michael R, Michael F, Jamie W, et al. Single image depth prediction with wavelet decomposition[C] ∥ Conference on Computer Vision and Pattern Recognition (CVPR), Online, 2021: 11089-11098.
[22] Olaf R, Philipp F, Thomas B. U-Net: convolutional networks for biomedical image segmentation[C]∥International Conference On Medical Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 2015: 234-241.
[23] Huang G, Liu Z, Maaten L V D, et al. Densely connected convolutional networks[C]∥2017 Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, USA, 2017: 2261-2269.
[24] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 770-778.
[25] Chen X T, Chen X J, Zha Z J. Structure-aware residual pyramid network for monocular depth estimation[C]∥28th International Joint Conference on Artificial Intelligence, Macau, China, 2019: 694-700.
[26] Geiger A, Lenz P, Stiller C, et al. Vision meets robotics: the kitti dataset[J]. The International Journal of Robotics Research, 2013, 32(11): 1231-1237.
[27] Pleiss G, Chen D, Huang G, et al. Memory-efficient implementation of densenets[J/OL].(2017-07-21)[2023-06-13]..
[28] Mehta I, Sakurikar P, Narayanan P J. Structured adversarial training for unsupervised monocular depth estimation[C]∥2018 International Conference on 3D Vision, Verona, Italy, 2018: 314-323.
[29] Matteo P, Fabio T, Stefano M. Learning monocular depth estimation with unsupervised trinocular assumptions[C]∥International Conference on 3D Vision (3DV), Verona, Italy, 2018: 324-333.
[30] Sudeep P, Rares A, Ambrus G, et al. Superdepth: self-supervised, super-resolved monocular depth estimation[C]∥2019 International Conference on Robotics and Automation (ICRA), Montreal, Canada, 2019: 9250-9256.
[1] 章家保,张剑阳,刘赫,李岩. 改进游程编码算法的快速星点提取[J]. 吉林大学学报(工学版), 2025, 55(4): 1467-1473.
[2] 胡刘博,吴建新,刘泉华,张磊. 使用机载分布式阵列对主瓣复合干扰进行分级抑制[J]. 吉林大学学报(工学版), 2025, 55(3): 1103-1110.
[3] 王海涛,刘慧卓,张学永,韦健,郭校源,肖俊哲. 基于单目视觉的车辆屏显式封闭驾驶舱前视视野重现[J]. 吉林大学学报(工学版), 2024, 54(5): 1435-1442.
[4] 苏育挺,景梦瑶,井佩光,刘先燚. 基于光度立体和深度学习的电池缺陷检测方法[J]. 吉林大学学报(工学版), 2024, 54(12): 3653-3659.
[5] 窦慧晶,谢东旭,郭威,邢路阳. 基于改进的正交匹配跟踪算法的波达方向估计[J]. 吉林大学学报(工学版), 2024, 54(12): 3568-3576.
[6] 白琳,刘林军,李轩昂,吴沙,刘汝庆. 基于自监督学习的单目图像深度估计算法[J]. 吉林大学学报(工学版), 2023, 53(4): 1139-1145.
[7] 王春阳,丘文乾,刘雪莲,肖博,施春皓. 基于平面拟合的地面点云精确分割方法[J]. 吉林大学学报(工学版), 2023, 53(3): 933-940.
[8] 李雪梅,王春阳,刘雪莲,施春浩,李国瑞. 基于超体素双向最近邻距离比的点云配准方法[J]. 吉林大学学报(工学版), 2022, 52(8): 1918-1925.
[9] 赵宏伟,张健荣,朱隽平,李海. 基于对比自监督学习的图像分类框架[J]. 吉林大学学报(工学版), 2022, 52(8): 1850-1856.
[10] 王震,盖孟,许恒硕. 基于虚拟现实技术的三维场景图像表面重建算法[J]. 吉林大学学报(工学版), 2022, 52(7): 1620-1625.
[11] 李雪梅,王春阳,刘雪莲,谢达. 基于SESTH的线性调频连续波激光雷达信号时延估计[J]. 吉林大学学报(工学版), 2022, 52(4): 950-958.
[12] 林乐平,卢增通,欧阳宁. 面向非配合场景的人脸重建及识别方法[J]. 吉林大学学报(工学版), 2022, 52(12): 2941-2946.
[13] 周大可,张超,杨欣. 基于多尺度特征融合及双重注意力机制的自监督三维人脸重建[J]. 吉林大学学报(工学版), 2022, 52(10): 2428-2437.
[14] 窦慧晶,丁钢,高佳,梁霄. 基于压缩感知理论的宽带信号波达方向估计[J]. 吉林大学学报(工学版), 2021, 51(6): 2237-2245.
[15] 金心宇,谢慕寒,孙斌. 基于半张量积压缩感知的粮情信息采集[J]. 吉林大学学报(工学版), 2021, 51(1): 379-385.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 吉林大学学报(工学版)编辑部 .

吉林大学学报(工学版)第38卷 第3期目录

[J]. 吉林大学学报(工学版), 2008, 38(03): 1 -0002 .
[2] 李树胜, 钟麦英. 基于PID的航空遥感三轴惯性稳定平台控制系统设计[J]. 吉林大学学报(工学版), 2011, 41(增刊1): 275 -279 .
[3] 何荣, 郭睿, 管欣, 彭立恩. 钢板弹簧Fancher模型物理机理及参数辨识[J]. 吉林大学学报(工学版), 2013, 43(01): 12 -16 .
[4] 郭黎滨, 张彬, 崔海, 张志航. 微细电火花线切割表面三维粗糙度的结构性参数[J]. 吉林大学学报(工学版), 2015, 45(3): 851 -856 .
[5] 朱冰, 贾晓峰, 王御, 吴坚, 赵健, 邓伟文. 基于双dSPACE的汽车动力学集成控制快速原型试验[J]. 吉林大学学报(工学版), 2016, 46(1): 8 -14 .
[6] 王健健, 冯平法, 张建富, 吴志军, 张国斌, 闫培龙. 卡盘定心精度建模及其保持特性与修复方法[J]. 吉林大学学报(工学版), 2016, 46(2): 487 -493 .
[7] 庄蔚敏, 赵文增, 解东旋, 李兵. 超高强钢/铝合金热铆连接接头性能[J]. 吉林大学学报(工学版), 2018, 48(4): 1016 -1022 .
[8] 高兴泉, 马苗苗,陈虹 . 考虑时域硬约束的T-S模糊系统最优控制[J]. 吉林大学学报(工学版), 2007, 37(03): 640 -0645 .
[9] 伍健荣, 李隽颖, 刘海涛. 时频降噪在图像序列事件检测中的应用[J]. , 2012, 42(05): 1273 -1279 .
[10] 张霖, 赵宏伟, 杨倚寒, 马智超, 黄虎, 马志超. 单层石墨烯薄膜材料纳米压痕过程的分子动力学解析[J]. 吉林大学学报(工学版), 2013, 43(06): 1558 -1565 .