基于数据增强的半监督单目深度估计框架

doi:10.13229/j.cnki.jdxbgxb.20230964

摘要/Abstract

摘要：

为解决监督学习在单目深度估计中需要大量标签数据的问题，提出了一种基于教师-学生模型的半监督深度估计框架AugDepth。其通过对数据进行扰动，训练模型学习扰动前、后的深度一致性。首先，采用平滑随机强度增强方法从连续域中采样强度，随机选择多个操作以增加数据随机性，并混合强弱增强输出，防止过度扰动。然后，考虑到不同无标签样本的训练难度不同，在通过Cutout提高模型对全局信息推理的前提下，根据对无标签样本的置信度，自适应地调整Cutout策略，以提高模型的泛化和学习能力。在KITTI和NYU-Depth数据集上的实验结果表明：AugDepth能够显著提高半监督深度估计的准确性，并在有标签数据稀缺的情况下表现出良好的鲁棒性。

关键词: 计算机应用, 半监督学习, 数据增强, 单目图像, 深度估计

Abstract:

To address the problem of requiring a large amount of labeled data for supervised learning in monocular depth estimation， a semi-supervised depth estimation framework AugDepth was proposed based on a teacher-student model. It operates by perturbing the data and training the model to learn depth consistency before and after the perturbation. Firstly， the smooth random intensity enhancement method was used to sample the intensity from the continuous domain. Multiple operations were randomly selected to increase the randomness of the data， and the output was enhanced by mixing the strength and weakness to prevent excessive disturbance. Then， considering the varying training difficulties of different unlabeled samples， while improving the model's inference of global information through Cutout， the Cutout strategy is adaptively adjusted based on the confidence level of unlabeled samples to enhance the model's generalization and learning abilities. The experimental results on the KITTI and NYU Deeph datasets show that AugDepth can significantly improve the accuracy of semi supervised depth estimation and exhibit good robustness in situations where labeled data is scarce.Key words：computer application； semi-supervised learning； data agumentation； monocular image； depth estimation

中图分类号:

TP391

赵宏伟,周伟民. 基于数据增强的半监督单目深度估计框架[J]. 吉林大学学报(工学版), 2025, 55(6): 2082-2088.

Hong-wei ZHAO,Wei-min ZHOU. Semi⁃supervised monocular depth estimation framework based on data augmentation[J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(6): 2082-2088.

图/表 8

图1

表1

图2

图3

表2

KITTI数据集上的定量结果"

方法	监督方式	AbsRel	SqRel	RMSE	RMSElog	$δ 1$
DORN^［15］	有监督	0.072	0.307	2.727	0.120	0.932
LapDepth（Resnet50）	有监督	0.064	0.259	2.828	0.102	0.949
Monodepth2^［16］	自监督	0.080	0.466	3.681	0.127	0.926
FeatDepth^［17］	自监督	0.079	0.666	3.922	0.163	0.925
Cho^［5］	半监督	0.095	0.613	4.129	0.175	0.884
SemiDepth^［18］	半监督	0.078	0.417	3.464	0.126	0.923
Baek^［14］	半监督	0.071	0.316	3.049	0.111	0.941
本文	半监督	0.062	0.251	2.726	0.098	0.954

表2

表3

NYU-Depth数据集上的定量结果"

方法	AbsRel	RMSE	$δ 1$
Eigen^［1］	0.158	0.641	0.769
DORN	0.115	0.509	0.828
BTS	0.112	0.352	0.882
LapDepth	0.110	0.393	0.885
DPT-Hybrid^［19］	0.110	0.357	0.904
Baek^［14］	0.109	0.392	0.894
Ours	0.105	0.395	0.889

表3

表4

AugDepth的消融实验"

MT	$A s$	$A c$	AbsRel	$δ 1$
			0.066（supervised）	0.948（supervised）
√			0.066	0.949
√	√		0.065	0.950
√		√	0.064	0.952
√	√	√	0.062	0.954

表4

表5

数据增强效果比较"

方法	AbsRel	$δ 1$
RandAugment	0.066 0.065 0.064 0.065 0.064 0.062	0.951
Cutout		0.951
RandomAugment + Cutout		0.953
$A s$		0.950
$A c$		0.952
$A c + A s$		0.954

表5

参考文献 19

[1]	Eigen D, Puhrsch C, Fergus R. Depth map predictionfrom a single image using a multi-scale deep network[C]∥Advances in Neural Information Processing Systems,Montreal, Canada, 2014: 2366-2374.
[2]	Song M, Lim S, Kim W. Monocular depth estimation using laplacian pyramid-based depth residuals[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(11): 4381-4393.
[3]	Lee J H, Han M K, Ko D W, et al. From big to small: multi-scale local planar guidance for monocular depth estimation[J/OL].[2023-08-26].
[4]	Ji R, Li K, Wang Y, et al. Semi-supervised adversarial monocular depth estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(10): 2410-2422.
[5]	Cho J, Min D, Kim Y, et al. A large RGB-D dataset for semi-supervised monocular depth estimation[J/OL]. [2023-08-27].
[6]	Guo X, Li H, Yi S, et al. Learning monocular depth by distilling cross-domain stereo networks[C]∥Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 2018: 506-523.
[7]	Cubuk E D, Zoph B, Shlens J, et al. Randaugment: practical automated data augmentation with a reduced search space[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops,Seattle, USA, 2020: 702-703.
[8]	Zhao Z, Yang L, Long S, et al. Augmentation matters: a simple-yet-effective approach to semi-supervisedsemantic segmentation[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Vancouver, Canada,2023: 11350-11359.
[9]	Zhao Z, Long S, Pi J, et al. Instance-specific and model-adaptive supervision for semi-supervised semantic segmentation[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada,2023: 23705-23714.
[10]	de Vries T, Taylor G W. Improved regularization of convolutional neural networks with cutout[J/OL].[2023-08-28].
[11]	Tarvainen A, Valpola H. Mean teachers are better rolemodels: weight-averaged consistency targets improve semi-supervised deep learning results[C]∥Advances in Neural Information Processing System,Vancouver, Canada, 2017: 1195-1204.
[12]	Yuan J, Liu Y, Shen C, et al. A simple baseline for semi-supervised semantic segmentation with strong data augmentation[C]∥IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 8209-8218.
[13]	Poggi M, Aleotti F, Tosi F, et al. On the uncertainty of self-supervised monocular depth estimation[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle, USA, 2020: 3227-3237.
[14]	Baek J, Kim G, Park S, et al. MaskingDepth: masked consistency regularization for semi-supervised monocular depth estimation[J/OL]. [2023-08-29].
[15]	Fu H, Gong M, Wang C, et al. Deep ordinal regression network for monocular depth estimation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Los Alamitos,USA, 2018: 2002-2011.
[16]	Godard C, Aodha O M, Firman M, et al. Digging into self-supervised monocular depth estimation[C]∥IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 2019: 3827-3837.
[17]	Shu C, Yu K, Duan Z, et al. Feature-metric loss for self-supervised learning of depth and egomotion[C]∥European Conference on Computer Vision,Glasgow, UK, 2020: 572-588.
[18]	Amiri A J, Loo S Y, Zhang H. Semi-supervised monocular depth estimation with left-right consistency using deep neural network[C]∥IEEE International Conference on Robotics and Biomimetics (ROBIO), Dali,China,2019: 602-607.
[19]	Ranftl R, Bochkovskiy A, Koltun V. Vision transformers for dense prediction[C]∥IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 12159-12168.

相关文章 15

[1]	王健,贾晨威. 面向智能网联车辆的轨迹预测模型[J]. 吉林大学学报(工学版), 2025, 55(6): 1963-1972.
[2]	车翔玖,孙雨鹏. 基于相似度随机游走聚合的图节点分类算法[J]. 吉林大学学报(工学版), 2025, 55(6): 2069-2075.
[3]	刘萍萍,商文理,解小宇,杨晓康. 基于细粒度分析的不均衡图像分类算法[J]. 吉林大学学报(工学版), 2025, 55(6): 2122-2130.
[4]	周丰丰,郭喆,范雨思. 面向不平衡多组学癌症数据的特征表征算法[J]. 吉林大学学报(工学版), 2025, 55(6): 2089-2096.
[5]	陈海鹏,张世博,吕颖达. 多尺度感知与边界引导的图像篡改检测方法[J]. 吉林大学学报(工学版), 2025, 55(6): 2114-2121.
[6]	申自浩,高永生,王辉,刘沛骞,刘琨. 面向车联网隐私保护的深度确定性策略梯度缓存方法[J]. 吉林大学学报(工学版), 2025, 55(5): 1638-1647.
[7]	王友卫,刘奥,凤丽洲. 基于知识蒸馏和评论时间的文本情感分类新方法[J]. 吉林大学学报(工学版), 2025, 55(5): 1664-1674.
[8]	赵宏伟,周明珠,刘萍萍,周求湛. 基于置信学习和协同训练的医学图像分割方法[J]. 吉林大学学报(工学版), 2025, 55(5): 1675-1681.
[9]	程德强,王伟臣,韩成功,吕晨,寇旗旗. 基于改进密集网络和小波分解的自监督单目深度估计[J]. 吉林大学学报(工学版), 2025, 55(5): 1682-1691.
[10]	侯越,郭劲松,林伟,张迪,武月,张鑫. 分割可跨越车道分界线的多视角视频车速提取方法[J]. 吉林大学学报(工学版), 2025, 55(5): 1692-1704.
[11]	王军,司昌馥,王凯鹏,付强. 融合集成学习技术和PSO-GA算法的特征提取技术的入侵检测方法[J]. 吉林大学学报(工学版), 2025, 55(4): 1396-1405.
[12]	徐涛,孔帅迪,刘才华,李时. 异构机密计算综述[J]. 吉林大学学报(工学版), 2025, 55(3): 755-770.
[13]	赵孟雪,车翔玖,徐欢,刘全乐. 基于先验知识优化的医学图像候选区域生成方法[J]. 吉林大学学报(工学版), 2025, 55(2): 722-730.
[14]	蔡晓东,周青松,张言言,雪韵. 基于动静态和关系特征全局捕获的社交推荐模型[J]. 吉林大学学报(工学版), 2025, 55(2): 700-708.
[15]	车翔玖,武宇宁,刘全乐. 基于因果特征学习的有权同构图分类算法[J]. 吉林大学学报(工学版), 2025, 55(2): 681-686.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

增强操作	增强策略细节
映射	返回原始图像
均衡化	均衡化图像的直方图
高斯模糊	用高斯核模糊图像
对比度	调整图像的对比度到［0.05，0.95］
锐度	调整图像的锐度到［0.05，0.95］
颜色	将图像的颜色平衡增强到［0.05，0.95］
亮度	调整图像的亮度到［0.05，0.95］
海报化	将每个像素减少到［4，8］位
太阳化	反转图像中高于［1，256］阈值的像素