基于知识蒸馏的图像分类框架

doi:10.13229/j.cnki.jdxbgxb.20230128

Abstract

Abstract:

In order to solve the problem that it is difficult to effectively integrate the features of CNN and Transformer network in the image classification task， this paper proposes an image classification framework based on knowledge distillation： Knowledge distillation image classification （KDIC）. In the KDIC framework， a variety of knowledge distillation methods are designed according to the difference of the network structure between CNNs and Transformer： this method effectively integrates the local features of CNNs and the global representation of Transformer into the lightweight student model， and proposes effective loss functions based on different knowledge distillation methods to improve the performance of image classification tasks. The image classification experiment was carried out on three public datasets， CIFAR10， CIFAR100 and UC-Merced. The experimental results show that the KDIC framework has obvious advantages over the current knowledge distillation method， and KDIC still has good performance and good generalization under different teacher and student networks.

Key words: computer application, knowledge distillation, image classification, convolution neural network

CLC Number:

TP391

Hong-wei ZHAO,Hong WU,Ke MA,Hai LI. Image classification framework based on knowledge distillation[J].Journal of Jilin University(Engineering and Technology Edition), 2024, 54(8): 2307-2312.

Figures/Tables 6

Fig.1

Table 1

Fig.2

Table 2

Table 3

Table 4

References 18

1	Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale[DB/OL].[2023-01-05]..
2	Liu Z, Lin Y, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]∥ Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 10012-10022.
3	赵宏伟, 张健荣, 朱隽平, 等. 基于对比自监督学习的图像分类框架[J]. 吉林大学学报: 工学版, 2022, 52(8): 1850-1856.
	Zhao Hong-wei, Zhang Jian-rong, Zhu Jun-ping, et al. Image classification framework based on contrastive self⁃supervised learning[J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(8): 1850-1856.
4	赵宏伟, 霍东升, 王洁, 等. 基于显著性检测的害虫图像分类[J]. 吉林大学学报: 工学版, 2021, 51(6): 2174-2181.
	Zhao Hong-wei, Huo Dong-sheng, Wang Jie, et al. Image classification of insect pests based on saliency detection[J]. Journal of Jilin University (Engineering and Technology Edition), 2021, 51(6): 2174-2181.
5	Mehta S, Rastegari M. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer[DB/OL].[2023-01-05]..
6	Chen Y, Dai X, Chen D, et al. Mobile-former: bridging mobilenet and transformer[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 5270-5279.
7	Zagoruyko S, Komodakis N. Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer[DB/OL].[2023-01-06].
8	Gou J, Yu B, Maybank S J, et al. Knowledge distillation: a survey[J]. International Journal of Computer Vision, 2021, 129: 1789-1819.
9	黄震华, 杨顺志, 林威, 等. 知识蒸馏研究综述[J]. 计算机学报, 2022, 45(3): 624-653.
	Huang Zhen-hua, Yang Shun-zhi, Lin Wei, et al. Research review on knowledge distillation[J]. Chinese Journal of Computers, 2022, 45(3): 624-653.
10	Raghu M, Unterthiner T, Kornblith S, et al. Do vision transformers see like convolutional neural networks?[J]. Advances in Neural Information Processing Systems, 2021, 34: 12116-12128.
11	He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770-778.
12	Yang Y, Newsam S. Bag-of-visual-words and spatial extensions for land-use classification[C]∥Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Joes, USA, 2010: 270-279.
13	Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[DB/OL].[2023-01-06].
14	Romero A, Ballas N, Kahou S E, et al. Fitnets: hints for thin deep nets[DB/OL].[2023-01-07].
15	Passalis N, Tzelepi M, Tefas A. Heterogeneous knowledge distillation using information flow modeling[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 2339-2348.
16	Chen D, Mei J P, Zhang Y, et al. Cross-layer distillation with semantic calibration[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(8): 7028-7036.
17	Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea(South), 2019: 1314-1324.
18	Ma N, Zhang X, Zheng H T, et al. Shufflenet v2: practical guidelines for efficient CNNs architecture design[C]∥Proceedings of the European Conference on Computer Vision, Munich, Germany, 2018: 116-131.

Related Articles 15

[1]	Chao-lu TEMUR,Ya-ping ZHANG. Link anomaly detection algorithm for wireless sensor networks based on convolutional neural networks [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(8): 2295-2300.
[2]	Jin-zhou ZHANG,Shi-qing JI,Chuang TAN. Fusion algorithm of convolution neural network and bilateral filtering for seam extraction [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(8): 2313-2318.
[3]	Yun-zuo ZHANG,Yu-xin ZHENG,Cun-yu WU,Tian ZHANG. Accurate lane detection of complex environment based on double feature extraction network [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(7): 1894-1902.
[4]	Ming-hui SUN,Hao XUE,Yu-bo JIN,Wei-dong QU,Gui-he QIN. Video saliency prediction with collective spatio-temporal attention [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(6): 1767-1776.
[5]	Yan-feng LI,Ming-yang LIU,Jia-ming HU,Hua-dong SUN,Jie-yu MENG,Ao-ying WANG,Han-yue ZHANG,Hua-min YANG,Kai-xu HAN. Infrared and visible image fusion based on gradient transfer and auto-encoder [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(6): 1777-1787.
[6]	Li-ping ZHANG,Bin-yu LIU,Song LI,Zhong-xiao HAO. Trajectory k nearest neighbor query method based on sparse multi-head attention [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(6): 1756-1766.
[7]	Li-ming LIANG,Long-song ZHOU,Jiang YIN,Xiao-qi SHENG. Fusion multi-scale Transformer skin lesion segmentation algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(4): 1086-1098.
[8]	Yun-zuo ZHANG,Wei GUO,Wen-bo LI. Omnidirectional accurate detection algorithm for dense small objects in remote sensing images [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(4): 1105-1113.
[9]	Xiao-xu LI,Wen-juan AN,Ji-jie WU,Zhen LI,Ke ZHANG,Zhan-yu MA. Channel attention bilinear metric network [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(2): 524-532.
[10]	Yun-zuo ZHANG,Xu DONG,Zhao-quan CAI. Multi view gait cycle detection by fitting geometric features of lower limbs [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2611-2619.
[11]	Ming-yao XIAO,Xiong-fei LI,Rui ZHU. Medical image fusion based on pixel correlation analysis in NSST domain [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2640-2648.
[12]	Guang HUO,Da-wei LIN,Yuan-ning LIU,Xiao-dong ZHU,Meng YUAN,Di GAI. Lightweight iris segmentation model based on multiscale feature and attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2591-2600.
[13]	Ying HE,Zhuo-ran WANG,Xu ZHOU,Yan-heng LIU. Point of interest recommendation algorithm integrating social geographical information based on weighted matrix factorization [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2632-2639.
[14]	Ya-hui ZHAO,Fei-yu LI,Rong-yi CUI,Guo-zhe JIN,Zhen-guo ZHANG,De LI,Xiao-feng JIN. Korean⁃Chinese translation quality estimation based on cross⁃lingual pretraining model [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(8): 2371-2379.
[15]	Xiang-jiu CHE,Huan XU,Ming-yang PAN,Quan-le LIU. Two-stage learning algorithm for biomedical named entity recognition [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(8): 2380-2387.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 10

[1]	LI Shoutao, LI Yuanchun. Autonomous Mobile Robot Control Algorithm Based on Hierarchical Fuzzy Behaviors in Unknown Environments[J]. 吉林大学学报(工学版), 2005, 35(04): 391 -397 .
[2]	Liu Qing-min，Wang Long-shan，Chen Xiang-wei，Li Guo-fa. Ball nut detection by machine vision[J]. 吉林大学学报(工学版), 2006, 36(04): 534 -538 .
[3]	Li Hong-ying; Shi Wei-guang;Gan Shu-cai. Electromagnetic properties and microwave absorbing property of Z type hexaferrite Ba_3-xLa_xCo₂Fe₂₄O₄₁[J]. 吉林大学学报(工学版), 2006, 36(06): 856 -0860 .
[4]	Zhang Quan-fa，Li Ming-zhe，Sun Gang，Ge Xin . Comparison between flexible and rigid blank-holding in multi-point forming[J]. 吉林大学学报(工学版), 2007, 37(01): 25 -30 .
[5]	Yang Shu-kai, Song Chuan-xue, An Xiao-juan, Cai Zhang-lin . Analyzing effects of suspension bushing elasticity on vehicle yaw response character with virtual prototype method[J]. 吉林大学学报(工学版), 2007, 37(05): 994 -0999 .
[6]	. [J]. 吉林大学学报(工学版), 2007, 37(06): 1284 -1287 .
[7]	Che Xiang-jiu，Liu Da-you，Wang Zheng-xuan . Construction of joining surface with G¹ continuity for two NURBS surfaces[J]. 吉林大学学报(工学版), 2007, 37(04): 838 -841 .
[8]	Liu Han-bing, Jiao Yu-ling, Liang Chun-yu,Qin Wei-jun . Effect of shape function on computing precision in meshless methods[J]. 吉林大学学报(工学版), 2007, 37(03): 715 -0720 .
[9]	. [J]. 吉林大学学报(工学版), 2007, 37(04): 0 .
[10]	Li Yue-ying，Liu Yong-bing，Chen Hua . Surface hardening and tribological properties of a cam materials[J]. 吉林大学学报(工学版), 2007, 37(05): 1064 -1068 .

student baseline	ResNet18			ViT Tiny
	CIFAR10	CIFAR100	UC Merced	CIFAR10	CIFAR100	UC Merced
	93.77 ± 0.64	79.44 ± 0.47	94.23 ± 0.44	94.19 ± 0.48	81.10 ± 0.32	95.94 ± 0.28
知识蒸馏方法	teacher：ResNet101 student：ResNet18			teacher：ViT Base student：ViT Tiny
知识蒸馏方法	CIFAR10	CIFAR100	UC Merced	CIFAR10	CIFAR100	UC Merced
KD^［13］	94.12 ± 0.83	80.32 ± 1.13	95.16 ± 0.84	94.88 ± 0.59	81.69 ± 0.54	96.39 ± 0.49
Fitnet^［14］	94.28 ± 0.42	80.68 ± 0.81	95.83 ± 0.41	94.78 ± 0.54	81.60 ± 0.64	96.44 ± 0.70
HKD^［15］	95.11 ± 0.61	81.05 ± 0.94	95.82 ± 0.66	95.12 ± 0.41	82.01 ± 0.59	96.60 ± 0.28
SemCKD^［16］	95.34 ± 0.55	80.89 ± 1.22	96.03 ± 0.54	95.48 ± 0.29	82.41 ± 1.02	96.66 ± 0.62
本文方法
KDIC	96.32 ± 0.23	81.82 ± 0.49	97.70 ± 0.25	97.10 ± 0.30	83.55 ± 0.44	97.80 ± 0.16

		a	b	c	d	e
层注意力蒸馏LAD		×	×	√	√	√
多层特征蒸馏MFD		×	√	×	√	√
特征对齐化蒸馏FAD		√	√	√	×	√
Top-1	ResNet18	81.35	81.43	81.57	81.50	81.94
Top-1	ViT Tiny	82.98	83.29	83.09	83.40	83.69

方法	Transformer模块内知识蒸馏
方法	ResNet18	ViT Tiny
层注意力蒸馏LAD	81.70	83.15
多层特征蒸馏MFD	81.94	83.69

Image classification framework based on knowledge distillation

RICH HTML

PDF (PC)