吉林大学学报(工学版) ›› 2025, Vol. 55 ›› Issue (6): 2082-2088.doi: 10.13229/j.cnki.jdxbgxb.20230964

• 计算机科学与技术 • 上一篇    下一篇

基于数据增强的半监督单目深度估计框架

赵宏伟(),周伟民   

  1. 吉林大学 软件学院,长春 130012
  • 收稿日期:2023-09-10 出版日期:2025-06-01 发布日期:2025-07-23
  • 作者简介:赵宏伟(1962-),男,教授,博士.研究方向:嵌入式人工智能.E-mail:zhaohw@jlu.edu.cn
  • 基金资助:
    吉林省省级科技创新专项项目(20190302026GX);吉林省自然科学基金项目(20200201037JC)

Semisupervised monocular depth estimation framework based on data augmentation

Hong-wei ZHAO(),Wei-min ZHOU   

  1. College of Software,Jilin University,Changchun 130012,China
  • Received:2023-09-10 Online:2025-06-01 Published:2025-07-23

摘要:

为解决监督学习在单目深度估计中需要大量标签数据的问题,提出了一种基于教师-学生模型的半监督深度估计框架AugDepth。其通过对数据进行扰动,训练模型学习扰动前、后的深度一致性。首先,采用平滑随机强度增强方法从连续域中采样强度,随机选择多个操作以增加数据随机性,并混合强弱增强输出,防止过度扰动。然后,考虑到不同无标签样本的训练难度不同,在通过Cutout提高模型对全局信息推理的前提下,根据对无标签样本的置信度,自适应地调整Cutout策略,以提高模型的泛化和学习能力。在KITTI和NYU-Depth数据集上的实验结果表明:AugDepth能够显著提高半监督深度估计的准确性,并在有标签数据稀缺的情况下表现出良好的鲁棒性。

关键词: 计算机应用, 半监督学习, 数据增强, 单目图像, 深度估计

Abstract:

To address the problem of requiring a large amount of labeled data for supervised learning in monocular depth estimation, a semi-supervised depth estimation framework AugDepth was proposed based on a teacher-student model. It operates by perturbing the data and training the model to learn depth consistency before and after the perturbation. Firstly, the smooth random intensity enhancement method was used to sample the intensity from the continuous domain. Multiple operations were randomly selected to increase the randomness of the data, and the output was enhanced by mixing the strength and weakness to prevent excessive disturbance. Then, considering the varying training difficulties of different unlabeled samples, while improving the model's inference of global information through Cutout, the Cutout strategy is adaptively adjusted based on the confidence level of unlabeled samples to enhance the model's generalization and learning abilities. The experimental results on the KITTI and NYU Deeph datasets show that AugDepth can significantly improve the accuracy of semi supervised depth estimation and exhibit good robustness in situations where labeled data is scarce.Key words:computer application; semi-supervised learning; data agumentation; monocular image; depth estimation

中图分类号: 

  • TP391

图1

AugDepth框架图"

表1

增强池"

增强操作增强策略细节
映射返回原始图像
均衡化均衡化图像的直方图
高斯模糊用高斯核模糊图像
对比度调整图像的对比度到[0.05,0.95]
锐度调整图像的锐度到[0.05,0.95]
颜色将图像的颜色平衡增强到[0.05,0.95]
亮度调整图像的亮度到[0.05,0.95]
海报化将每个像素减少到[48]位
太阳化反转图像中高于[1,256] 阈值的像素

图2

自适应的Cutout增强"

图3

不同标签数量的定量结果"

表2

KITTI数据集上的定量结果"

方法监督方式AbsRelSqRelRMSERMSElogδ1
DORN15有监督0.0720.3072.7270.1200.932
LapDepth(Resnet50)有监督0.0640.2592.8280.1020.949
Monodepth216自监督0.0800.4663.6810.1270.926
FeatDepth17自监督0.0790.6663.9220.1630.925
Cho5半监督0.0950.6134.1290.1750.884
SemiDepth18半监督0.0780.4173.4640.1260.923
Baek14半监督0.0710.3163.0490.1110.941
本文半监督0.0620.2512.7260.0980.954

表3

NYU-Depth数据集上的定量结果"

方法AbsRelRMSEδ1
Eigen10.1580.6410.769
DORN0.1150.5090.828
BTS0.1120.3520.882
LapDepth0.1100.3930.885
DPT-Hybrid190.1100.3570.904
Baek140.1090.3920.894
Ours0.1050.3950.889

表4

AugDepth的消融实验"

MTAsAcAbsRelδ1
0.066(supervised)0.948(supervised)
0.0660.949
0.0650.950
0.0640.952
0.0620.954

表5

数据增强效果比较"

方法AbsRelδ1
RandAugment

0.066

0.065

0.064

0.065

0.064

0.062

0.951
Cutout0.951
RandomAugment + Cutout0.953
As0.950
Ac0.952
Ac+As0.954
[1] Eigen D, Puhrsch C, Fergus R. Depth map predictionfrom a single image using a multi-scale deep network[C]∥Advances in Neural Information Processing Systems,Montreal, Canada, 2014: 2366-2374.
[2] Song M, Lim S, Kim W. Monocular depth estimation using laplacian pyramid-based depth residuals[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(11): 4381-4393.
[3] Lee J H, Han M K, Ko D W, et al. From big to small: multi-scale local planar guidance for monocular depth estimation[J/OL].[2023-08-26].
[4] Ji R, Li K, Wang Y, et al. Semi-supervised adversarial monocular depth estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(10): 2410-2422.
[5] Cho J, Min D, Kim Y, et al. A large RGB-D dataset for semi-supervised monocular depth estimation[J/OL]. [2023-08-27].
[6] Guo X, Li H, Yi S, et al. Learning monocular depth by distilling cross-domain stereo networks[C]∥Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 2018: 506-523.
[7] Cubuk E D, Zoph B, Shlens J, et al. Randaugment: practical automated data augmentation with a reduced search space[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops,Seattle, USA, 2020: 702-703.
[8] Zhao Z, Yang L, Long S, et al. Augmentation matters: a simple-yet-effective approach to semi-supervisedsemantic segmentation[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Vancouver, Canada,2023: 11350-11359.
[9] Zhao Z, Long S, Pi J, et al. Instance-specific and model-adaptive supervision for semi-supervised semantic segmentation[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada,2023: 23705-23714.
[10] de Vries T, Taylor G W. Improved regularization of convolutional neural networks with cutout[J/OL].[2023-08-28].
[11] Tarvainen A, Valpola H. Mean teachers are better rolemodels: weight-averaged consistency targets improve semi-supervised deep learning results[C]∥Advances in Neural Information Processing System,Vancouver, Canada, 2017: 1195-1204.
[12] Yuan J, Liu Y, Shen C, et al. A simple baseline for semi-supervised semantic segmentation with strong data augmentation[C]∥IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 8209-8218.
[13] Poggi M, Aleotti F, Tosi F, et al. On the uncertainty of self-supervised monocular depth estimation[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle, USA, 2020: 3227-3237.
[14] Baek J, Kim G, Park S, et al. MaskingDepth: masked consistency regularization for semi-supervised monocular depth estimation[J/OL]. [2023-08-29].
[15] Fu H, Gong M, Wang C, et al. Deep ordinal regression network for monocular depth estimation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Los Alamitos,USA, 2018: 2002-2011.
[16] Godard C, Aodha O M, Firman M, et al. Digging into self-supervised monocular depth estimation[C]∥IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 2019: 3827-3837.
[17] Shu C, Yu K, Duan Z, et al. Feature-metric loss for self-supervised learning of depth and egomotion[C]∥European Conference on Computer Vision,Glasgow, UK, 2020: 572-588.
[18] Amiri A J, Loo S Y, Zhang H. Semi-supervised monocular depth estimation with left-right consistency using deep neural network[C]∥IEEE International Conference on Robotics and Biomimetics (ROBIO), Dali,China,2019: 602-607.
[19] Ranftl R, Bochkovskiy A, Koltun V. Vision transformers for dense prediction[C]∥IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 12159-12168.
[1] 王健,贾晨威. 面向智能网联车辆的轨迹预测模型[J]. 吉林大学学报(工学版), 2025, 55(6): 1963-1972.
[2] 车翔玖,孙雨鹏. 基于相似度随机游走聚合的图节点分类算法[J]. 吉林大学学报(工学版), 2025, 55(6): 2069-2075.
[3] 刘萍萍,商文理,解小宇,杨晓康. 基于细粒度分析的不均衡图像分类算法[J]. 吉林大学学报(工学版), 2025, 55(6): 2122-2130.
[4] 周丰丰,郭喆,范雨思. 面向不平衡多组学癌症数据的特征表征算法[J]. 吉林大学学报(工学版), 2025, 55(6): 2089-2096.
[5] 陈海鹏,张世博,吕颖达. 多尺度感知与边界引导的图像篡改检测方法[J]. 吉林大学学报(工学版), 2025, 55(6): 2114-2121.
[6] 申自浩,高永生,王辉,刘沛骞,刘琨. 面向车联网隐私保护的深度确定性策略梯度缓存方法[J]. 吉林大学学报(工学版), 2025, 55(5): 1638-1647.
[7] 王友卫,刘奥,凤丽洲. 基于知识蒸馏和评论时间的文本情感分类新方法[J]. 吉林大学学报(工学版), 2025, 55(5): 1664-1674.
[8] 赵宏伟,周明珠,刘萍萍,周求湛. 基于置信学习和协同训练的医学图像分割方法[J]. 吉林大学学报(工学版), 2025, 55(5): 1675-1681.
[9] 程德强,王伟臣,韩成功,吕晨,寇旗旗. 基于改进密集网络和小波分解的自监督单目深度估计[J]. 吉林大学学报(工学版), 2025, 55(5): 1682-1691.
[10] 侯越,郭劲松,林伟,张迪,武月,张鑫. 分割可跨越车道分界线的多视角视频车速提取方法[J]. 吉林大学学报(工学版), 2025, 55(5): 1692-1704.
[11] 王军,司昌馥,王凯鹏,付强. 融合集成学习技术和PSO-GA算法的特征提取技术的入侵检测方法[J]. 吉林大学学报(工学版), 2025, 55(4): 1396-1405.
[12] 徐涛,孔帅迪,刘才华,李时. 异构机密计算综述[J]. 吉林大学学报(工学版), 2025, 55(3): 755-770.
[13] 赵孟雪,车翔玖,徐欢,刘全乐. 基于先验知识优化的医学图像候选区域生成方法[J]. 吉林大学学报(工学版), 2025, 55(2): 722-730.
[14] 蔡晓东,周青松,张言言,雪韵. 基于动静态和关系特征全局捕获的社交推荐模型[J]. 吉林大学学报(工学版), 2025, 55(2): 700-708.
[15] 车翔玖,武宇宁,刘全乐. 基于因果特征学习的有权同构图分类算法[J]. 吉林大学学报(工学版), 2025, 55(2): 681-686.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 施国标,申荣卫,林逸 . 电动助力转向系统的建模与仿真技术[J]. 吉林大学学报(工学版), 2007, 37(01): 31 -36 .
[2] 崔岩, 张万喜, 刘福长, 张成义, 李琴. 水性嵌段型纳米结构外墙漆的制备[J]. 吉林大学学报(工学版), 2005, 35(05): 482 -0485 .
[3] 裴士辉,赵宏伟,张孝临,王鹏 . 基于Vandermonde矩阵的分布式
密钥分发中心方案
[J]. 吉林大学学报(工学版), 2007, 37(05): 1154 -1158 .
[4] 于生宝,张贤涛,陈天琦,王兆明 . 基于不接触电极的电阻率探测方法[J]. 吉林大学学报(工学版), 2008, 38(02): 370 -0373 .
[5] 张含卓,江中浩,连建设,李光玉 . 电流密度对电沉积纳米晶铜工艺
及显微组织的影响
[J]. 吉林大学学报(工学版), 2007, 37(05): 1074 -1077 .
[6] 姚智胜,邵春福,熊志华,岳昊 . 基于主成分分析和支持向量机的道路网短时交通流量预测 [J]. 吉林大学学报(工学版), 2008, 38(01): 48 -52 .
[7] 袁锐,马旭,马成林,王未,乔欣,杨丹. 精密播种机单体的虚拟制造和运动仿真[J]. 吉林大学学报(工学版), 2006, 36(04): 523 -528 .
[8] 徐安,乔向明. 基于更新理论的复杂设备故障率表达[J]. 吉林大学学报(工学版), 2006, 36(03): 359 -0362 .
[9] 缪铭, 江波, 张涛 . Kabuli和Desi品种鹰嘴豆淀粉结构及功能性质[J]. 吉林大学学报(工学版), 2008, 38(06): 1495 -1500 .
[10] 石文孝, 李海波, 龚静. TD SCDMA系统接力切换技术[J]. 吉林大学学报(工学版), 2009, 39(05): 1358 -1363 .