基于带阈值的BPE-dropout多任务学习的端到端语音识别

Abstract

Abstract: Aiming at the problem of unknown words in speech recognition tasks, we proposed a threshold based-BPE-dropout multi-task learning speech recognition method. This method adopted a random byte pair coding algorithm. When forming sub-words, a strategy with word number threshold was introduced. The sub-words were used as modeling units, and the encoder part adopted Conformer structure, which was combined with link timing classification and attention mechanism. In order to further improve the performance of the model, dynamic parameters were introduced to dynamically adjust the loss function, and multi-task training and decoding were performed simultaneously. The experimental results show that the proposed method can effectively solve the problem of unknown words by using sub-words as modeling units, and further improve the recognition performance of the model under the multi-task learning framework. On the public datasets THCHS30 and ST-CMDS, the model achieves more than 95% recognition accuracy.

Key words: speech recognition, multi-task learning, byte pair coding, dynamic adjustment parameter

CLC Number:

TP391

MA Jian, DUO Lin, WEI Guixiang, TANG Jian. End-to-End Speech Recognition Based on Threshold-Based BPE-Dropout Multi-task Learning[J].Journal of Jilin University Science Edition, 2024, 62(3): 674-682.

[1]	JIANG Nan, PANG Yongheng, GAO Shuang. Speech Recognition Based on Attention Mechanism and Spectrogram Feature Extraction [J]. Journal of Jilin University Science Edition, 2024, 62(2): 320-0330.
[2]	SHI Xiaohu, YUAN Yuping, LV Guilin, CHANG Zhiyong, ZOU Yuanjun. Compression Algorithms for Automatic Speech Recognition Models: A Survey [J]. Journal of Jilin University Science Edition, 2024, 62(1): 122-0131.
[3]	LIU Yanxiu, SUN Yiming, YANG Huamin. Noise Robust Continuous Speech Recognition Based on Normalization [J]. Journal of Jilin University Science Edition, 2015, 53(03): 519-524.
[4]	WU Xi hong, WU Hao, GAO Qin, LIN Xiao jun, WANG Xin hao. Latent Semantic Analysis Language Model and Its Application in Chinese Large Vocabulary Continuous Speech Recognition [J]. J4, 2006, 44(06): 16-20.
[5]	WANG Peng, LIU Jia, LIU Run-sheng. Discrete HMM Based Speaker Independent Keyword Spotting Speech Recognition Syste m [J]. J4, 2003, 41(03): 347-351.

End-to-End Speech Recognition Based on Threshold-Based BPE-Dropout Multi-task Learning

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 5

Metrics

Comments

Recommended 10