Journal of Jilin University Science Edition ›› 2022, Vol. 60 ›› Issue (4): 881-888.

Previous Articles     Next Articles

Activation Map Adaptation Model for Knowledge Distillation

WU Zhiyuan1,2, QI Hong1,3, JIANG Yu1,3, CUI Chupeng1, YANG Zongmin1, XUE Xinhui1   

  1. 1. College of Computer Science and Technology, Jilin University, Changchun 130012, China;
    2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
    3. Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
  • Received:2021-06-21 Online:2022-07-26 Published:2022-07-26

Abstract: Aiming at the problem that computational and storage resources of embedded and mobile devices were limited, and the compact  network optimization was easy to  converge to poor local optimal solutions,  we proposed an  activation map adaptation model for knowledge distillation,  which was composed  of an activation map adapter and an activation map adaptation knowledge distillation strategy. Firstly, the activation map adapter realized activation map size matching, synchronous transformation of teacher-student network features, and adaptive semantic information matching by heterogeneous convolution and stacking of visual feature expression modules. Secondly, the activation map adaptation knowledge distillation strategy embedded the adapter into the teacher network to reconstruct it, and realized adaptively search suitable for the  supervision features of the hidden layer of the student network  during training process,  the front output  of the adapter was used to prompt the front training of the 
student network, so as to realize knowledge transfer from the teacher to the student network, and further optimize it  under the constraint of learning rate. Finally, experimental verification was carried out on the image classification task  dataset cifar-10. The results show  that the classification accuracy of the activation map adaptive knowledge distillation model is improved by 0.6%, the inference loss is reduced by 6.5%, and  the time to converge to 78.2% accuracy is reduced to 5.6% when it is not migrated.

Key words: artificial intelligence, knowledge distillation, activation map adaptation, model transfer, image classification

CLC Number: 

  • TP391