吉林大学学报(理学版) ›› 2024, Vol. 62 ›› Issue (2): 320-0330.

• • 上一篇    下一篇

基于注意力机制语谱图特征提取的语音识别

姜囡1, 庞永恒1, 高爽2   

  1. 1. 中国刑事警察学院 公安信息技术与情报学院, 沈阳 110854;
    2. 东北大学 信息科学与工程学院, 沈阳 110819
  • 收稿日期:2023-03-08 出版日期:2024-03-26 发布日期:2024-03-26
  • 通讯作者: 姜囡 E-mail:zgxj_jiangnan@126.com

Speech Recognition Based on Attention Mechanism and Spectrogram Feature Extraction

JIANG Nan1, PANG Yongheng1, GAO Shuang2   

  1. 1. School of Public Security Information Technology and Intelligence, Criminal Investigation Police University of China, Shenyang 110854, China; 2. College of Information Science and Engineering, Northeastern University, Shenyang 110819, China
  • Received:2023-03-08 Online:2024-03-26 Published:2024-03-26

摘要: 针对连接时序分类模型需具有输出独立性的假设, 对语言模型的依赖性强且训练周期长的问题, 提出一种基于连接时序分类模型的语音识别方法. 首先, 基于传统声学模型的框架, 利用先验知识训练基于注意力机制的语谱图特征提取网络, 有效提高了语音特征的区分性和鲁棒性; 其次, 将语谱图特征提取网络拼接在连接时序分类模型的前端, 并减少模型中循环神经网络层数进行重新训练. 测试分析结果表明, 该改进模型缩短了训练时间, 有效提升了语音识别准确率.

关键词: 语音识别, CTC模型, 循环神经网络, 注意力机制

Abstract: Aiming at the problem that the connected temporal classification model needed to have output independence assumption, and there was strong dependence on language model and long training period, we proposed  a speech recognition method based on connected temporal classification model. Firstly, based on the framework of traditional acoustic model, spectrogram feature extraction network based on attention mechanism was trained by using prior knowledge, which effectively improved the discrimination and robustness of speech features. Secondly, the spectrogram feature extraction network was spliced in the 
front of the connected temporal  classification model, and the number of layers of the recurrent neural network in the model was reduced for retraining. The test analysis results show that the improved model shortens the training time, and effectively improves the  accuracy of speech recognition.

Key words: speech recognition, CTC model, recurrent neural network, attention mechanism

中图分类号: 

  • TP391