Journal of Jilin University Science Edition ›› 2024, Vol. 62 ›› Issue (3): 674-682.
Previous Articles Next Articles
MA Jian, DUO Lin, WEI Guixiang, TANG Jian
Received:
Online:
Published:
Abstract: Aiming at the problem of unknown words in speech recognition tasks, we proposed a threshold based-BPE-dropout multi-task learning speech recognition method. This method adopted a random byte pair coding algorithm. When forming sub-words, a strategy with word number threshold was introduced. The sub-words were used as modeling units, and the encoder part adopted Conformer structure, which was combined with link timing classification and attention mechanism. In order to further improve the performance of the model, dynamic parameters were introduced to dynamically adjust the loss function, and multi-task training and decoding were performed simultaneously. The experimental results show that the proposed method can effectively solve the problem of unknown words by using sub-words as modeling units, and further improve the recognition performance of the model under the multi-task learning framework. On the public datasets THCHS30 and ST-CMDS, the model achieves more than 95% recognition accuracy.
Key words: speech recognition, multi-task learning, byte pair coding, dynamic adjustment parameter
CLC Number:
MA Jian, DUO Lin, WEI Guixiang, TANG Jian. End-to-End Speech Recognition Based on Threshold-Based BPE-Dropout Multi-task Learning[J].Journal of Jilin University Science Edition, 2024, 62(3): 674-682.
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: http://xuebao.jlu.edu.cn/lxb/EN/
http://xuebao.jlu.edu.cn/lxb/EN/Y2024/V62/I3/674
Cited