Journal of Jilin University Science Edition ›› 2024, Vol. 62 ›› Issue (3): 674-682.

Previous Articles     Next Articles

End-to-End  Speech Recognition Based on Threshold-Based BPE-Dropout Multi-task Learning

MA Jian, DUO Lin, WEI Guixiang, TANG Jian   

  1. Faculty of Information Engineering and Automation, Kuming University of Science and Technology, Kunming 650500, China
  • Received:2023-06-16 Online:2024-05-26 Published:2024-05-26

Abstract: Aiming at  the problem of unknown words in speech recognition tasks, we proposed a threshold based-BPE-dropout multi-task learning speech recognition method. This method adopted a random byte pair coding algorithm. When forming sub-words, a strategy with word number threshold was introduced. The sub-words were used as modeling units, and the encoder part adopted Conformer structure, which was combined with link timing classification and attention mechanism. In order to further improve the performance of the model,  dynamic parameters were  introduced to dynamically adjust the loss function, and  multi-task training and decoding were performed simultaneously. The experimental results show that the proposed method can effectively solve the problem of unknown words by using sub-words as modeling units, and further improve the recognition performance of the model under the multi-task learning framework. On the public datasets THCHS30 and ST-CMDS, the model achieves more than 95% recognition accuracy.

Key words: speech recognition, multi-task learning, byte pair coding, dynamic adjustment parameter

CLC Number: 

  • TP391