吉林大学学报(理学版) ›› 2024, Vol. 62 ›› Issue (4): 943-950.

• • 上一篇    下一篇

基于融合特征ADRMFCC的语音识别方法

朵琳, 马建, 韦贵香, 唐剑   

  1. 昆明理工大学 信息工程与自动化学院,  昆明 650500
  • 收稿日期:2023-07-12 出版日期:2024-07-26 发布日期:2024-07-26
  • 通讯作者: 马建 E-mail:2703729898@qq.com

Speech Recognition Method Based on Fusion Feature ADRMFCC

DUO Lin, MA Jian, WEI Guixiang, TANG Jian   

  1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
  • Received:2023-07-12 Online:2024-07-26 Published:2024-07-26

摘要: 针对在复杂噪声环境下语音识别准确率低和鲁棒性差的问题, 提出一种基于增减残差Mel倒谱融合特征的语音识别方法. 该方法首先利用增减分量法筛选关键语音特征, 然后将其映射到Mel域-残差域空间坐标系中生成增减残差Mel倒谱系数, 最后将这些融合特征用于训练端到端模型. 实验结果表明, 该方法在不同噪声类型和信噪比条件下均显著提高了语音识别准确率及性能, 在-5 dB低信噪比条件下, 语音识别准确率达73.13%, 而在其他噪声条件下的平均语音识别准确率达88.67%, 充分证明了该方法的有效性和鲁棒性.

关键词: 语音识别, 残差Mel倒谱系数, 特征筛选, 增减分量法

Abstract: Aiming at the problem of low accuracy and poor robustness of speech recognition in complex noise environment, we proposed  a speech recognition method based on Mel cepstrum fusion feature of increasing and decreasing residuals.  This method first used the increase and decrease component method to screen the key speech features, and then mapped them to the Mel domain-residual domain spatial coordinate system to generate the increase and decrease residual Mel cepstral coefficients. Finally, these fusion features were used to train the end-to-end model. The experimental results show that the proposed method significantly improves the  accuracy and performance of speech recognition under different noise types and signal-to-noise ratio conditions. Under the low signal-to-noise ratio condition of -5 dB, the speech recognition accuracy reaches 73.13%, while the average speech 
recognition accuracy under other noise conditions reaches 88.67%, which fully proves the effectiveness and robustness of the proposed method.

Key words: speech recognition, residual Mel cepstral coefficient, feature screening, increase and decrease , component method

中图分类号: 

  • TP391