基于多尺度特征融合及双重注意力机制的自监督三维人脸重建

doi:10.13229/j.cnki.jdxbgxb20210630

摘要/Abstract

摘要：

针对三维人脸重建算法的精度不足和三维人脸标注样本数量较少的问题，引入了多尺度特征提取融合模块和双重注意力机制模块，提出了一种以单幅人脸图像作为输入、利用编解码网络预测重建分量的自监督三维人脸重建算法。引入的多尺度特征提取融合模块有助于获取更丰富的多尺度人脸特征信息，编解码网络中引入双重注意力机制模块，进一步提升网络的特征提取能力，同时单幅图像输入的自监督方法绕开了传统方法中对于数据集的高要求。在BFM、Photoface和CelebA人脸数据集上进行了对比实验和消融实验，实验结果表明，相比于Unsup3D等代表性的人脸重建算法，本文算法在尺度不变深度误差（Scale-invariant depth error， SIDE）和平均角度偏差（Mean angle deviation， MAD）两项评价指标上，分别取得了10.3%到12.6%的性能提升；此外，该算法对输入图像的部分遮挡/缺失拥有着更好的鲁棒性。

关键词: 模式识别与智能系统, 三维人脸重建, 特征融合, 空洞卷积, 注意力机制, 自监督学习

Abstract:

To deal with the problems of insufficient precision of 3D face reconstruction methods and small number of labeled public 3D faces， a self-supervised neural network using multi-scale feature fusion and dual attention mechanism for 3D face reconstruction is presented in this paper. The proposed network， taking a single face image as input， employs encoder-decoder modules to predict the reconstruction components. The proposed multi-scale feature extraction fusion can obtain multi-level face feature information， while dual attention mechanisms are integrated into the encoder-decoders to further improve the feature extraction ability of the network. Moreover， the self-supervised scheme with a single image input bypasses the high requirements for training datasets in traditional methods. We conducted comparative experiments and ablation experiments on the BFM， Photoface and CelebA face datasets. Experimental results show， compared to the well-known 3D face reconstruction methods such as Unsup3D， the proposed method performs 10.3% better on scale-invariant depth error （SIDE）， and 12.6% better on mean angle deviation（MAD）respectively. In addition， our method is more robust to partial occlusion or missing of input image.

Key words: pattern recognition and intelligent system, 3D face reconstruction, feature fusion, atrous convolution, attention mechanism, self-supervised learning

中图分类号:

TP391.4

周大可,张超,杨欣. 基于多尺度特征融合及双重注意力机制的自监督三维人脸重建[J]. 吉林大学学报(工学版), 2022, 52(10): 2428-2437.

Da-ke ZHOU,Chao ZHANG,Xin YANG. Self-supervised 3D face reconstruction based on multi-scale feature fusion and dual attention mechanism[J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(10): 2428-2437.

图/表 16

图1

图2

图3

图4

图5

图6

图7

表1

表2

表3

表4

表5

表6

图8

表7

图9

参考文献 35

1	徐成华, 王蕴红, 谭铁牛. 三维人脸建模与应用[J]. 中国图象图形学报, 2004, 9(8): 893-903.
	Xu Cheng-hua, Wang Yun-hong, Tan Tie-niu. Overview of research on 3D face modeling[J]. Journal of Image and Graphics, 2004, 9(8): 893-903.
2	王琨, 郑南宁. 基于SFM算法的三维人脸模型重建[J]. 计算机学报, 2005, 28(6): 1048-1053.
	Wang Kun, Zheng Nan-ning. 3D face modeling based on SFM algorithm[J]. Chinese Journal of Computers, 2005, 28(6): 1048-1053.
3	Zhu Wen-bin, Wu Hsiang-tao, Chen Ze-yu, et al. Reda: reinforced differentiable attribute for 3D face reconstruction[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 4958-4967.
4	Zhu Xiang-yu, Lei Zhen, Liu Xiao-ming, et al. Face alignment across large poses: a 3D solution[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 146-155.
5	Richardson E, Sela M, Or-El R, et al. Learning detailed face reconstruction from a single image[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 1259-1268.
6	Blanz V, Vetter T. A morphable model for the synthesis of 3D faces [C]∥Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, New York, USA, 1999: 187-194.
7	Booth J, Roussos A, Ponniah A, et al. Large scale 3D morphable models[J]. International Journal of Computer Vision, 2018, 126(2): 233-254.
8	Tran L, Liu Xiao-ming. On learning 3D face morphable model from in-the-wild images [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43(1): 157-171.
9	Cao Chen, Weng Yan-lin, Zhou Shun, et al. Facewarehouse: a 3D facial expression database for visual computing[J]. IEEE Transactions on Visualization and Computer Graphics, 2013, 20(3): 413-425.
10	Tuan Tran A, Hassner T, Masi I, et al. Regressing robust and discriminative 3D morphable models with a very deep neural network[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 5163-5172.
11	Chang F J, Tran A T, Hassner T, et al. Expnet: Landmark-free, deep, 3D facial expressions[C]∥The 13th IEEE International Conference on Automatic Face and Gesture Recognition, Xi'an, China, 2018: 122-129.
12	Chang F J, Tuan Tran A, Hassner T, et al. Faceposenet: making a case for landmark-free face alignment[C]∥Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 2017: 1599-1608.
13	Gecer B, Ploumpis S, Kotsia I, et al. Ganfit: generative adversarial network fitting for high fidelity 3d face reconstruction[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 1155-1164.
14	Jackson A S, Bulat A, Argyriou V, et al. Large pose 3D face reconstruction from a single image via direct volumetric CNN regression[C]∥Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017: 1031-1039.
15	Feng Yao, Wu Fan, Shao Xiao-hu, et al. Joint 3d face reconstruction and dense alignment with position map regression network[C]∥Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 2018: 534-551.
16	Tewari A, Zollhofer M, Kim H, et al. Mofa: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction[C]∥Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 2017: 1274-1283.
17	Sanyal S, Bolkart T, Feng Hai-wen, et al. Learning to regress 3D face shape and expression from an image without 3D supervision[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 7763-7772.
18	Tu Xiao-guang, Zhao Jian, Xie Mei, et al. 3D face reconstruction from a single image assisted by 2d face images in the wild[J]. IEEE Transactions on Multimedia, 2020, 23: 1160-1172.
19	Wu S Z, Rupprecht C, Vedaldi A. Unsupervised learning of probably symmetric deformable 3d objects from images in the wild[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 1-10.
20	Gao Yuan, Yuille A L. Exploiting symmetry and/or manhattan properties for 3d object structure estimation from single and multiple images[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA,2017:7408-7417.
21	Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation[C]∥International Conference on Medical image computing and computer-assisted intervention, Munich, Germany, 2015: 234-241.
22	Zhang Ruo, Tsai P S, Cryer J E, et al. Shape-from-shading: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(8): 690-706.
23	Kato H, Ushiku Y, Harada T. Neural 3D mesh renderer[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 3907-3916.
24	Yang Mao-ke, Yu Kun, Zhang Chi, et al. Denseaspp for semantic segmentation in street scenes[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 3684-3692.
25	Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]∥Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 2018: 3-19.
26	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition [J]. arXiv preprint arXiv:, 2014.
27	Liu Zi-wei, Luo Ping, Wang Xiao-gang, et al. Deep learning face attributes in the wild[C]∥Proceedings of the IEEE International Conference on Computer Vision,Santiago, Chile, 2015: 3730-3738.
28	Paysan P, Knothe R, Amberg B, et al. A 3D face model for pose and illumination invariant face recognition[C]∥The Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance,Genova, Italy, 2009: 296-301.
29	Zafeiriou S, Hansen M, Atkinson G, et al. The photoface database[C]∥CVPR Workshops, Colorado Springs, USA, 2011: 132-139.
30	Sengupta S, Kanazawa A, Castillo C D, et al. Sfsnet: learning shape, reflectance and illuminance of facesin the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 44(6): 6296-6305.
31	Xiao Jian-xiong, Hays J, Ehinger K A, et al. Sun database: large-scale scene recognition from abbey to zoo[C]∥IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, USA, 2010: 3485-3492.
32	Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network[C]∥Proceedings of the 27th International Conference on Neural Information Processing Systems,Bangkok, Thailand, 2014: 2366-2374.
33	Sela M, Richardson E, Kimmel R. Unrestricted facial geometry reconstruction using image-to-image translation[C]∥Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017: 1576-1585.
34	Tran A T, Hassner T, Masi I, et al. Extreme 3D face reconstruction: seeing through occlusions[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 3935-3944.
35	Trigeorgis G, Snape P, Kokkinos I, et al. Face normal "in-the-wild" using fully convolutional networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 38-47.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

d	a	l	w	c	SIDE（×10^-2）↓	MAD（ ? ）↓
√					0.7743	15.8709
	√				0.7754	15.8925
		√			0.7721	15.7134
			√		0.7743	15.6324
				√	0.7791	15.6525
√	√	√	√	√	0.7637	15.2986

d	a	l	w	c	SIDE（×10^-2）↓	MAD（ ? ）↓
√					0.7778	15.6845
	√				0.7505	15.1609
		√			0.7512	15.4257
			√		0.7608	15.6892
				√	0.7697	15.8959
√	√	√	√	√	0.7160	14.7222

d	a	l	w	c	SIDE（×10^-2）↓	MAD（ ? ）↓
√					0.7529	15.0565
	√				0.7369	14.7608
		√			0.7365	15.1116
			√		0.7572	15.1532
				√	0.7375	15.0053
√	√	√	√	√	0.7110	14.4342

方法	SIDE（×10^-2）↓	MAD （ ? ）↓	MAE（×10^-2）↓	MSE（×10^-4）↓
Const. null depth^［19］	2.723	43.34	-	-
Average g.t depth^［19］	1.990	23.26	-	-
Unsup3D^［19］	0.793	16.51	0.579	0.715
本文方法	0.711	14.43	0.537	0.626

方法	MAD（ ? ）↓	方法	MAD（ ? ）↓
Pix2V^［33］	33.9 27.0 26.3 26.0	SfSNet^［29］	25.5
Extreme^［34］		PRN^［15］	24.8
FNI^［35］		Unsup3D^［19］	24.1
3DDFA^［4］		本文	23.7