基于多任务学习的传统服饰图像双层标注

doi:10.13229/j.cnki.jdxbgxb20190970

摘要/Abstract

摘要：

针对当前图像多标签标注方法只能标注图像内容信息（本体），而不能同时标注图像寓意信息（隐义）的问题，提出了一种基于多任务学习的双层多标签标注模型（MTL-DMAM）。首先将图像的本体标注和隐义标注视为两个关联任务，以ResNeXt-50作为共享特征的主干网络，然后利用注意力机制分别为每个任务构建一个分支结构，实现了图像双层标注，同时为消除图像内各物体大小差异对标注结果的影响，在模型中加入ELASTIC结构，进一步提高了模型性能。在对比实验中，本文模型在单任务MS-COCO数据集和多任务传统服饰数据集上优于其他同类模型。最后，利用Grad-cam方法可视化模型MTL-DMAM在标注时重点关注的图像区域，实验结果表明本文模型能有效学习标签对应的图像显著特征。

关键词: 人工智能, 传统服饰, 多任务学习, 多标签标注, 注意力机制

Abstract:

To solve the problem that current image multi-label annotation methods can only annotate image content information （ontology）， but can not simultaneously annotate image implied information （implicit）， this paper proposes a double-layer multi-label annotation model based on multi-task learning （MTL-DMAM）. Firstly， the image ontology annotation and implicit annotation are regarded as two related tasks， and ResNeXt-50 is used as the backbone network of shared features. Then， in order to realize image double-level annotation， attention mechanism is used to construct a branch structure for each task. In order to eliminate the influence of different object sizes on labeling results in images， the ELASTIC structure is added to the model to improve the performance of the model. The comparative experiment results show that， on single task MS-COCO data set， the proposed model is superior to most advanced models in the indicators of C-R， C-F1， O-R， and mAP， and on multi-task traditional costume data set， the proposed model is superior to all other models in 10 indicators. Finally， we use the Grad-cam method to visualize the image region that MTL-DMAM focuses on when labeling， and the experimental results show that the proposed model can effectively learn the salient features of the image corresponding to labels.

Key words: artificial intelligence, traditional costume, multi-task learning, multi-label annotation, attention mechanisms

中图分类号:

TP181

赵海英,周伟,侯小刚,张小利. 基于多任务学习的传统服饰图像双层标注[J]. 吉林大学学报(工学版), 2021, 51(1): 293-302.

Hai-ying ZHAO,Wei ZHOU,Xiao-gang HOU,Xiao-li ZHANG. Double-layer annotation of traditional costume images based on multi-task learning[J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(1): 293-302.

图/表 10

图1

图2

图3

图4

图5

表1

表2

表3

图6

图7

参考文献 32

1	赵鑫全. 互联网时代文化消费如何升级[J]. 人民论坛, 2019(23): 132-133.
	Zhao Xin-quan. How to upgrade cultural consumption in internet era[J]. People's Tribune, 2019(23): 132-133.
2	张会, 陈晨. "互联网+"背景下的汉语国际教育与文化传播[J]. 语言文字应用, 2019(2): 30-38.
	Zhang Hui, Chen Chen. The international Chinese language education and cultural communication under "internet plus"[J]. Applied Linguistics, 2019(2): 30-38.
3	Mehta S, Rastegari M, Shapiro L, et al. Espnetv2: a light-weight, power efficient, and general purpose convolutional neural network[C]∥IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019: 9190-9200.
4	Wang H, Kembhavi A, Farhadi A, et al. ELASTIC: improving CNNs with dynamic scaling policies[C]∥IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019: 2258-2267.
5	Mehta S, Hajishirzi H, Rastegari M. DiCENet: dimension-wise convolutions for efficient networks[J]. arXiv, 2019:1906.03516.
6	Zhang J, Ding S, Zhang N. An overview on probability undirected graphs and their applications in image processing[J]. Neurocomputing, 2018, 321: 156-168.
7	陈绵书, 于录录, 苏越, 等. 基于卷积神经网络的多标签图像分类[J]. 吉林大学学报:工学版, 2020,50(3):1077-1084.
	Chen Mian-shu, Yu Lu-lu, Su Yue, et al. Multi-label images classification based on convolutional neural network[J]. Journal of Jilin University (Engineering and Technology Edition),2020,50(3): 1077-1084.
8	Ding S, Du P, Zhao X, et al. BEMD image fusion based on PCNN and compressed sensing[J]. Soft Computing, 2019, 23(20): 10045-10054.
9	王柯俨, 胡妍, 王怀, 等. 结合天空分割和超像素级暗通道的图像去雾算法[J]. 吉林大学学报:工学版, 2019, 49(4): 1377-1384.
	Wang Ke-yan,Hu Yan,Wang Huai, et al. Image dehazing algorithm by sky segmentation and superpixel⁃level dark channel[J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(4): 1377-1384.
10	谌华, 郭伟, 闫敬文, 等. 基于深度学习的SAR图像道路识别新方法[J]. 吉林大学学报:工学版, 2020,50(5):1778-1787.
	Chen Hua,Guo Wei,Yan Jing-wen, et al. A new deep learning method for roads recognition from SAR images[J]. Journal of Jilin University (Engineering and Technology Edition),2020,50(5):1778-1787.
11	Gong Y, Jia Y, Leung T, et al. Deep convolutional ranking for multilabel image annotation[J]. arXiv, 2013:1312.4894.
12	Wang J, Yang Y, Mao J, et al. CNN-RNN: a unified framework for multi-label image classification[C]∥IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016: 2285-2294.
13	Li Y, Song Y, Luo J. Improving pairwise ranking for multi-label image classification[C]∥IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017: 3617-3625.
14	Cevikalp H, Benligiray B, Gerek O N, et al. Semi-Supervised robust deep neural networks for multi-label classification[C]∥IEEE Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, USA, 2019: 9-17.
15	Zhang J, Wu Q, Shen C, et al. Multilabel image classification with regional latent semantic dependencies[J]. IEEE Transactions on Multimedia, 2018, 20(10): 2801-2813.
16	Wang Z, Chen T, Li G, et al. Multi-label image recognition by recurrently discovering attentional regions[C]∥Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017: 464-472.
17	Zhu F, Li H, Ouyang W, et al. Learning spatial regularization with image-level supervisions for multi-label image classification[C]∥IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017: 5513-5522.
18	Luo Y, Jiang M, Zhao Q. Visual attention in multi-label image classification[C]∥IEEE Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 2019: 1-8.
19	Guo H, Zheng K, Fan X, et al. Visual attention consistency under image transforms for multi-label image classification[C]∥ IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019: 729-739.
20	张钰, 刘建伟, 左信. 多任务学习[J]. 计算机学报, 2020,43(7):1340-1378.
	Zhang Yu, Liu Jian-wei, Zuo Xin. Survey of multi-task learning[J]. Chinese Journal of Computers, 2020,43(7):1340-1378.
21	Kao Y, He R, Huang K. Deep aesthetic quality assessment with semantic information[J]. IEEE Transactions on Image Processing, 2017, 26(3): 1482-1495.
22	Misra I, Shrivastava A, Gupta A, et al. Cross-stitch networks for multi-task learning[C]∥IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Las Vegas, NV, USA, 2016: 3994-4003.
23	Vandenhende S, de Brabandere B, van Gool L. Branched multi-task networks: deciding what layers to share[J]. Computer Vision and Pattern Recognition, 2019: arXiv: 1904.02920.
24	Liu S, Johns E, Davison A J. End-to-end multi-task learning with attention[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, USA,2019: 1871-1880.
25	王松, 党建武, 王阳萍, 等. 基于3D运动历史图像和多任务学习的动作识别[J]. 吉林大学学报:工学版, 2020,50(4):1495-1502.
	Wang Song,Dang Jian-wu,Wang Yang-ping,et al. Action recognition based on 3D motion history image and multi-task learning[J]. Journal of Jilin University(Engineering and Technology Edition), 2020,50(4): 1495-1502.
26	赵海英, 陈洪, 贾耕云, 等. 基于字典学习的民族文化图案语义标注[J]. 中国科学:信息科学, 2019, 49(2): 172-187.
	Zhao Hai-ying, Chen Hong, Jia Geng-yun, et al. Semantic annotation of national cultural patterns based on dictionary learning[J]. Science in China (Information Sciences), 2019, 49(2): 172-187.
27	Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]∥IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017: 1492-1500.
28	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. Computer Science, 2014: 1-14.
29	He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]∥IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016: 770-778.
30	Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[C]∥Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016: 2818-2826.
31	Wu X Z, Zhou Z H. A unified view of multi-label performance measures[C]∥Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 2017: 3780-3788.
32	Selvaraju R R, Cogswell M, Das A, et al. Grad-cam: visual explanations from deep networks via gradient-based localization[C]∥Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017: 618-626.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

方法	ALL							top?3
方法	mAP	C?P	C?R	C?F1	O?P	O?R	O?F1	C?P	C?R	C?F1	O?P	O?R	O?F1
WARP^[11]	-	-	-	-	-	-	-	55.7	59.3	52.5	60.7	59.8	61.4
CNN?RNN^[12]	-	-	-	-	-	-	-	66.0	55.6	60.4	69.2	66.4	67.8
CNN+LSTM^[15]	61.8	-	-	-	-	-	-	62.1	51.2	56.1	68.1	56.6	61.8
MCG?CNN+LSTM^[15]	64.4	-	-	-	-	-	-	64.2	53.1	58.1	61.3	59.3	61.3
RLSD^[15]	68.2	-	-	-	-	-	-	67.6	57.2	62.0	70.1	63.4	66.5
RDAR^[16]	-	-	-	-	-	-	-	79.1	58.7	67.4	84.0	63.0	72.0
ResNet?101^[29]	75.2	80.8	63.4	69.5	82.2	68.0	74.4	84.3	57.4	65.9	86.5	61.3	71.7
ResNet?SRN^[17]	77.1	81.6	65.4	71.2	82.7	69.9	75.8	85.2	58.8	67.4	87.4	62.5	72.9
ResNeXt?50	76.2	79.5	63.9	70.3	83.1	68.3	75.0	82.9	58.0	68.2	87.3	61.7	72.3
ResNeXt?50+E	77.2	81.2	65.0	71.2	83.3	69.4	75.7	83.4	58.9	69.1	87.6	62.6	73.0
MTL?DMAM+E	77.1	80.5	65.5	71.3	82.4	70.0	75.7	84.2	59.1	69.5	87.1	62.7	72.9

方法	本体标注						隐义标注
方法	C-P	C-R	C-F1	O-P	O-R	O-F1	C-P	C-R	C-F1	O-P	O-R	O-F1
MTL-FC	81.22	86.93	83.55	78.46	87.98	82.95	82.06	89.47	85.20	78.82	89.03	83.62
MTL-FC+E	83.18	88.54	85.47	80.78	89.26	84.81	84.25	90.27	86.80	80.48	89.96	84.96
MTL-Conv5	82.43	83.42	82.44	80.62	84.95	82.73	84.15	88.10	85.62	80.77	87.75	84.12
MTL-Conv5+E	84.15	86.92	85.01	81.89	88.10	84.88	86.03	89.40	87.24	82.72	88.80	85.65
MTL-Conv4	71.50	69.31	69.02	70.53	71.76	71.14	75.82	76.54	75.11	71.81	75.50	73.61
MTL-Conv4+E	79.31	82.40	80.41	77.83	83.55	80.59	81.41	86.46	83.47	78.38	85.88	81.96
MTL-Conv1	82.12	88.59	84.85	79.96	89.38	84.41	83.33	87.98	85.07	79.11	87.05	82.89
MTL-Conv1+E	84.42	85.74	84.87	83.43	86.35	84.86	82.65	90.56	86.13	79.67	89.61	84.35
MTL-DMAM	83.38	88.89	85.58	80.25	89.61	84.67	83.85	89.89	86.27	79.75	89.61	84.40
MTL-DMAM+E	84.12	87.76	85.22	81.24	88.91	84.90	85.20	90.71	87.36	81.99	90.32	85.95

[1]	赵宏伟,刘晓涵,张媛,范丽丽,龙曼丽,臧雪柏. 基于关键点注意力和通道注意力的服装分类算法[J]. 吉林大学学报(工学版), 2020, 50(5): 1765-1770.
[2]	车翔玖,董有政. 基于多尺度信息融合的图像识别改进算法[J]. 吉林大学学报(工学版), 2020, 50(5): 1747-1754.
[3]	欧阳丹彤,马骢,雷景佩,冯莎莎. 知识图谱嵌入中的自适应筛选[J]. 吉林大学学报(工学版), 2020, 50(2): 685-691.
[4]	李贻斌,郭佳旻,张勤. 人体步态识别方法与技术[J]. 吉林大学学报(工学版), 2020, 50(1): 1-18.
[5]	徐谦,李颖,王刚. 基于深度学习的行人和车辆检测[J]. 吉林大学学报(工学版), 2019, 49(5): 1661-1667.
[6]	高万夫,张平,胡亮. 基于已选特征动态变化的非线性特征选择方法[J]. 吉林大学学报(工学版), 2019, 49(4): 1293-1300.
[7]	欧阳丹彤,肖君,叶育鑫. 基于实体对弱约束的远监督关系抽取[J]. 吉林大学学报(工学版), 2019, 49(3): 912-919.
[8]	顾海军, 田雅倩, 崔莹. 基于行为语言的智能交互代理[J]. 吉林大学学报(工学版), 2018, 48(5): 1578-1585.
[9]	董飒, 刘大有, 欧阳若川, 朱允刚, 李丽娜. 引入二阶马尔可夫假设的逻辑回归异质性网络分类方法[J]. 吉林大学学报(工学版), 2018, 48(5): 1571-1577.
[10]	王旭, 欧阳继红, 陈桂芬. 基于垂直维序列动态时间规整方法的图相似度度量[J]. 吉林大学学报(工学版), 2018, 48(4): 1199-1205.
[11]	张浩, 占萌苹, 郭刘香, 李誌, 刘元宁, 张春鹤, 常浩武, 王志强. 基于高通量数据的人体外源性植物miRNA跨界调控建模[J]. 吉林大学学报(工学版), 2018, 48(4): 1206-1213.
[12]	李雄飞, 冯婷婷, 骆实, 张小利. 基于递归神经网络的自动作曲算法[J]. 吉林大学学报(工学版), 2018, 48(3): 866-873.
[13]	刘杰, 张平, 高万夫. 基于条件相关的特征选择方法[J]. 吉林大学学报(工学版), 2018, 48(3): 874-881.
[14]	黄岚, 纪林影, 姚刚, 翟睿峰, 白天. 面向误诊提示的疾病-症状语义网构建[J]. 吉林大学学报(工学版), 2018, 48(3): 859-865.
[15]	王旭, 欧阳继红, 陈桂芬. 基于多重序列所有公共子序列的启发式算法度量多图的相似度[J]. 吉林大学学报(工学版), 2018, 48(2): 526-532.