Journal of Jilin University(Engineering and Technology Edition) ›› 2024, Vol. 54 ›› Issue (8): 2307-2312.doi: 10.13229/j.cnki.jdxbgxb.20230128

Previous Articles     Next Articles

Image classification framework based on knowledge distillation

Hong-wei ZHAO1(),Hong WU1,Ke MA2,Hai LI1   

  1. 1.College of Computer Science and Technology,Jilin University,Changchun 130012,China
    2.School of Mechanical and Aerospace Engineering,Jilin University,Changchun 130025,China
  • Received:2023-02-14 Online:2024-08-01 Published:2024-08-30

Abstract:

In order to solve the problem that it is difficult to effectively integrate the features of CNN and Transformer network in the image classification task, this paper proposes an image classification framework based on knowledge distillation: Knowledge distillation image classification (KDIC). In the KDIC framework, a variety of knowledge distillation methods are designed according to the difference of the network structure between CNNs and Transformer: this method effectively integrates the local features of CNNs and the global representation of Transformer into the lightweight student model, and proposes effective loss functions based on different knowledge distillation methods to improve the performance of image classification tasks. The image classification experiment was carried out on three public datasets, CIFAR10, CIFAR100 and UC-Merced. The experimental results show that the KDIC framework has obvious advantages over the current knowledge distillation method, and KDIC still has good performance and good generalization under different teacher and student networks.

Key words: computer application, knowledge distillation, image classification, convolution neural network

CLC Number: 

  • TP391

Fig.1

Overall architecture of KDIC framework"

Table 1

Comparison experiment of different knowledge distillation methods"

student baselineResNet18ViT Tiny
CIFAR10CIFAR100UC MercedCIFAR10CIFAR100UC Merced
93.77 ± 0.6479.44 ± 0.4794.23 ± 0.4494.19 ± 0.4881.10 ± 0.3295.94 ± 0.28
知识蒸馏方法teacher:ResNet101 student:ResNet18teacher:ViT Base student:ViT Tiny
CIFAR10CIFAR100UC MercedCIFAR10CIFAR100UC Merced
KD1394.12 ± 0.8380.32 ± 1.1395.16 ± 0.8494.88 ± 0.5981.69 ± 0.5496.39 ± 0.49
Fitnet1494.28 ± 0.4280.68 ± 0.8195.83 ± 0.4194.78 ± 0.5481.60 ± 0.6496.44 ± 0.70
HKD1595.11 ± 0.6181.05 ± 0.9495.82 ± 0.6695.12 ± 0.4182.01 ± 0.5996.60 ± 0.28
SemCKD1695.34 ± 0.5580.89 ± 1.2296.03 ± 0.5495.48 ± 0.2982.41 ± 1.0296.66 ± 0.62
本文方法
KDIC96.32 ± 0.2381.82 ± 0.4997.70 ± 0.2597.10 ± 0.3083.55 ± 0.4497.80 ± 0.16

Fig.2

Confusion matrix of classification results on CIFAR10"

Table 2

Experiment of different teachers(T)-students(S) models"

方法

S:MobilenetV317

Swin Tiny

S:shuffleNetV218

Swin Small

MobilenetV3

Swin

Tiny

ShuffleNetV2Swin Small
T:ResNet50,Swin Base79.3883.1081.0484.91

Table 3

Ablation experimental results of KDIC structure"

abcde
层注意力蒸馏LAD××
多层特征蒸馏MFD××
特征对齐化蒸馏FAD×
Top-1ResNet1881.3581.4381.5781.5081.94
ViT Tiny82.9883.2983.0983.4083.69

Table 4

Ablation experimental results of knowledge distillation methods"

方法Transformer模块内知识蒸馏
ResNet18ViT Tiny
层注意力蒸馏LAD81.7083.15
多层特征蒸馏MFD81.9483.69
1 Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale[DB/OL].[2023-01-05]..
2 Liu Z, Lin Y, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]∥ Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 10012-10022.
3 赵宏伟, 张健荣, 朱隽平, 等. 基于对比自监督学习的图像分类框架[J]. 吉林大学学报: 工学版, 2022, 52(8): 1850-1856.
Zhao Hong-wei, Zhang Jian-rong, Zhu Jun-ping, et al. Image classification framework based on contrastive self⁃supervised learning[J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(8): 1850-1856.
4 赵宏伟, 霍东升, 王洁, 等. 基于显著性检测的害虫图像分类[J]. 吉林大学学报: 工学版, 2021, 51(6): 2174-2181.
Zhao Hong-wei, Huo Dong-sheng, Wang Jie, et al. Image classification of insect pests based on saliency detection[J]. Journal of Jilin University (Engineering and Technology Edition), 2021, 51(6): 2174-2181.
5 Mehta S, Rastegari M. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer[DB/OL].[2023-01-05]..
6 Chen Y, Dai X, Chen D, et al. Mobile-former: bridging mobilenet and transformer[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 5270-5279.
7 Zagoruyko S, Komodakis N. Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer[DB/OL].[2023-01-06].
8 Gou J, Yu B, Maybank S J, et al. Knowledge distillation: a survey[J]. International Journal of Computer Vision, 2021, 129: 1789-1819.
9 黄震华, 杨顺志, 林威, 等. 知识蒸馏研究综述[J]. 计算机学报, 2022, 45(3): 624-653.
Huang Zhen-hua, Yang Shun-zhi, Lin Wei, et al. Research review on knowledge distillation[J]. Chinese Journal of Computers, 2022, 45(3): 624-653.
10 Raghu M, Unterthiner T, Kornblith S, et al. Do vision transformers see like convolutional neural networks?[J]. Advances in Neural Information Processing Systems, 2021, 34: 12116-12128.
11 He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770-778.
12 Yang Y, Newsam S. Bag-of-visual-words and spatial extensions for land-use classification[C]∥Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Joes, USA, 2010: 270-279.
13 Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[DB/OL].[2023-01-06].
14 Romero A, Ballas N, Kahou S E, et al. Fitnets: hints for thin deep nets[DB/OL].[2023-01-07].
15 Passalis N, Tzelepi M, Tefas A. Heterogeneous knowledge distillation using information flow modeling[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 2339-2348.
16 Chen D, Mei J P, Zhang Y, et al. Cross-layer distillation with semantic calibration[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(8): 7028-7036.
17 Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea(South), 2019: 1314-1324.
18 Ma N, Zhang X, Zheng H T, et al. Shufflenet v2: practical guidelines for efficient CNNs architecture design[C]∥Proceedings of the European Conference on Computer Vision, Munich, Germany, 2018: 116-131.
[1] Chao-lu TEMUR,Ya-ping ZHANG. Link anomaly detection algorithm for wireless sensor networks based on convolutional neural networks [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(8): 2295-2300.
[2] Jin-zhou ZHANG,Shi-qing JI,Chuang TAN. Fusion algorithm of convolution neural network and bilateral filtering for seam extraction [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(8): 2313-2318.
[3] Yun-zuo ZHANG,Yu-xin ZHENG,Cun-yu WU,Tian ZHANG. Accurate lane detection of complex environment based on double feature extraction network [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(7): 1894-1902.
[4] Ming-hui SUN,Hao XUE,Yu-bo JIN,Wei-dong QU,Gui-he QIN. Video saliency prediction with collective spatio-temporal attention [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(6): 1767-1776.
[5] Yan-feng LI,Ming-yang LIU,Jia-ming HU,Hua-dong SUN,Jie-yu MENG,Ao-ying WANG,Han-yue ZHANG,Hua-min YANG,Kai-xu HAN. Infrared and visible image fusion based on gradient transfer and auto-encoder [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(6): 1777-1787.
[6] Li-ping ZHANG,Bin-yu LIU,Song LI,Zhong-xiao HAO. Trajectory k nearest neighbor query method based on sparse multi-head attention [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(6): 1756-1766.
[7] Li-ming LIANG,Long-song ZHOU,Jiang YIN,Xiao-qi SHENG. Fusion multi-scale Transformer skin lesion segmentation algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(4): 1086-1098.
[8] Yun-zuo ZHANG,Wei GUO,Wen-bo LI. Omnidirectional accurate detection algorithm for dense small objects in remote sensing images [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(4): 1105-1113.
[9] Xiao-xu LI,Wen-juan AN,Ji-jie WU,Zhen LI,Ke ZHANG,Zhan-yu MA. Channel attention bilinear metric network [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(2): 524-532.
[10] Yun-zuo ZHANG,Xu DONG,Zhao-quan CAI. Multi view gait cycle detection by fitting geometric features of lower limbs [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2611-2619.
[11] Ming-yao XIAO,Xiong-fei LI,Rui ZHU. Medical image fusion based on pixel correlation analysis in NSST domain [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2640-2648.
[12] Guang HUO,Da-wei LIN,Yuan-ning LIU,Xiao-dong ZHU,Meng YUAN,Di GAI. Lightweight iris segmentation model based on multiscale feature and attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2591-2600.
[13] Ying HE,Zhuo-ran WANG,Xu ZHOU,Yan-heng LIU. Point of interest recommendation algorithm integrating social geographical information based on weighted matrix factorization [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2632-2639.
[14] Ya-hui ZHAO,Fei-yu LI,Rong-yi CUI,Guo-zhe JIN,Zhen-guo ZHANG,De LI,Xiao-feng JIN. Korean⁃Chinese translation quality estimation based on cross⁃lingual pretraining model [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(8): 2371-2379.
[15] Xiang-jiu CHE,Huan XU,Ming-yang PAN,Quan-le LIU. Two-stage learning algorithm for biomedical named entity recognition [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(8): 2380-2387.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LI Shoutao, LI Yuanchun. Autonomous Mobile Robot Control Algorithm Based on Hierarchical Fuzzy Behaviors in Unknown Environments[J]. 吉林大学学报(工学版), 2005, 35(04): 391 -397 .
[2] Liu Qing-min,Wang Long-shan,Chen Xiang-wei,Li Guo-fa. Ball nut detection by machine vision[J]. 吉林大学学报(工学版), 2006, 36(04): 534 -538 .
[3] Li Hong-ying; Shi Wei-guang;Gan Shu-cai. Electromagnetic properties and microwave absorbing property
of Z type hexaferrite Ba3-xLaxCo2Fe24O41
[J]. 吉林大学学报(工学版), 2006, 36(06): 856 -0860 .
[4] Zhang Quan-fa,Li Ming-zhe,Sun Gang,Ge Xin . Comparison between flexible and rigid blank-holding in multi-point forming[J]. 吉林大学学报(工学版), 2007, 37(01): 25 -30 .
[5] Yang Shu-kai, Song Chuan-xue, An Xiao-juan, Cai Zhang-lin . Analyzing effects of suspension bushing elasticity
on vehicle yaw response character with virtual prototype method
[J]. 吉林大学学报(工学版), 2007, 37(05): 994 -0999 .
[6] . [J]. 吉林大学学报(工学版), 2007, 37(06): 1284 -1287 .
[7] Che Xiang-jiu,Liu Da-you,Wang Zheng-xuan . Construction of joining surface with G1 continuity for two NURBS surfaces[J]. 吉林大学学报(工学版), 2007, 37(04): 838 -841 .
[8] Liu Han-bing, Jiao Yu-ling, Liang Chun-yu,Qin Wei-jun . Effect of shape function on computing precision in meshless methods[J]. 吉林大学学报(工学版), 2007, 37(03): 715 -0720 .
[9] . [J]. 吉林大学学报(工学版), 2007, 37(04): 0 .
[10] Li Yue-ying,Liu Yong-bing,Chen Hua . Surface hardening and tribological properties of a cam materials[J]. 吉林大学学报(工学版), 2007, 37(05): 1064 -1068 .