Journal of Jilin University Science Edition ›› 2026, Vol. 64 ›› Issue (2): 394-0402.

Previous Articles     Next Articles

Low-Resource Speech Keyword Spotting Based on Joint Knowledge Transfer

HUANG Jinxin1, HE Qianhua1, ZHENG Ruowei1, YANG Mingru1, WANG Wenwu2   

  1. 1. School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510641, China;
    2. Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, UK
  • Received:2024-08-14 Online:2026-03-26 Published:2026-03-26

Abstract: Aiming at the problem of  the low accuracy of speech keyword spotting under low-resource conditions, we proposed a detection method combining unsupervised feature extraction and supervised model parameter transfer. Firstly, a deep feature extraction network was trained by using large-scale unlabeled speech data, and the extracted features were fused with acoustic spectrogram 
features to enhance robustness of the features to  acoustic environments. Secondly, the decision network was pre-trained by using rich labeled data from the source domain, and decision knowledge was introduced through parameter transfer to solve the problem of model convergence difficulty caused by insufficient training data in the target domain. Finally, the entire network was fine-tuned by using  a very small amount of target domain data. Experimental results on Hakka and Cantonese datasets show that this method significantly outperforms single transfer strategies. In the Hakka task, the false rejection rate is reduced to 11.77%, and the maximum term weighted  value is improved to 0.734 6. The experimental results demonstrate that the proposed method can effectively alleviate the problem of data scarcity and significantly improve detection performance for low-resource languages.

Key words: speech keyword spotting, deep learning, low-resource, joint knowledge transfer

CLC Number: 

  • TP391