吉林大学学报(理学版) ›› 2024, Vol. 62 ›› Issue (3): 674-682.

• • 上一篇    下一篇

基于带阈值的BPE-dropout多任务学习的端到端语音识别

马建, 朵琳, 韦贵香, 唐剑   

  1. 昆明理工大学 信息工程与自动化学院, 昆明 650500
  • 收稿日期:2023-06-16 出版日期:2024-05-26 发布日期:2024-05-26
  • 通讯作者: 朵琳 E-mail:duolin2003@126.com

End-to-End  Speech Recognition Based on Threshold-Based BPE-Dropout Multi-task Learning

MA Jian, DUO Lin, WEI Guixiang, TANG Jian   

  1. Faculty of Information Engineering and Automation, Kuming University of Science and Technology, Kunming 650500, China
  • Received:2023-06-16 Online:2024-05-26 Published:2024-05-26

摘要: 针对语音识别任务中出现的未登录词问题, 提出一种带阈值的BPE-dropout多任务学习语音识别方法. 该方法采用带随机性的字节对编码算法, 在形成子词时引入带字数阈值的策略, 将子词作为建模单元, 编码器部分采用Conformer结构, 与链接时序分类和注意力机制相结合. 为进一步提升模型性能, 引入动态参数对损失函数进行动态调节, 并同时进行多任务训练和解码. 实验结果表明, 该方法采用子词作为建模单元可有效解决未登录词问题, 在多任务学习框架下进一步提升了模型的识别性能. 在公开数据集THCHS30和ST-CMDS上, 该模型实现了超过95%的识别准确率.

关键词: 语音识别, 多任务学习, 字节对编码, 动态调节参数

Abstract: Aiming at  the problem of unknown words in speech recognition tasks, we proposed a threshold based-BPE-dropout multi-task learning speech recognition method. This method adopted a random byte pair coding algorithm. When forming sub-words, a strategy with word number threshold was introduced. The sub-words were used as modeling units, and the encoder part adopted Conformer structure, which was combined with link timing classification and attention mechanism. In order to further improve the performance of the model,  dynamic parameters were  introduced to dynamically adjust the loss function, and  multi-task training and decoding were performed simultaneously. The experimental results show that the proposed method can effectively solve the problem of unknown words by using sub-words as modeling units, and further improve the recognition performance of the model under the multi-task learning framework. On the public datasets THCHS30 and ST-CMDS, the model achieves more than 95% recognition accuracy.

Key words: speech recognition, multi-task learning, byte pair coding, dynamic adjustment parameter

中图分类号: 

  • TP391