吉林大学学报(理学版) ›› 2026, Vol. 64 ›› Issue (2): 403-0410.

• • 上一篇    下一篇

基于改进的多度量优化自适应语音增强算法

付春雨, 刘均   

  1. 东北石油大学 物理与电子工程学院, 黑龙江 大庆 163318
  • 收稿日期:2024-10-15 出版日期:2026-03-26 发布日期:2026-03-26
  • 通讯作者: 刘均 E-mail:liujun@nepu.edu.cn

Adaptive Speech Enhancement Algorithm Based on Improved Multi-metric Optimization

FU Chunyu, LIU Jun   

  1. School of Physics and Electronic Engineering, Northeast Petroleum University, Daqing 163318, Heilongjiang Province, China
  • Received:2024-10-15 Online:2026-03-26 Published:2026-03-26

摘要: 针对多指标语音增强算法在训练过程中易受异常值干扰、 导致优化不稳定的问题, 提出一种基于多头注意力机制的自适应语音增强算法. 首先, 通过在判别网络中间层引入多头注意力结构, 增强模型对语音频谱局部特征和整体结构的联合建模能力, 并结合在线知识蒸馏策略实现多生成器之间的信息共享, 从而提升多指标条件下的协同优化效果. 其次, 为减少异常值对训练过程的影响, 将损失函数替换为对数均方误差形式, 以提升模型的稳定性和鲁棒性. 在公开语音数据集VoiceBank-DEMAND上的实验结果表明, 该方法在语音质
量、 背景噪声抑制和语音可懂度指标上均优于现有多指标语音增强模型. 因此, 引入注意力机制与稳定化损失函数能显著改善多指标语音增强算法的综合性能.

关键词: 语音增强, 频域, 多头注意力机制, 在线知识蒸馏, 对数均方误差损失

Abstract: Aiming at  the problem of susceptibility to outlier interference and unstable optimization during training process of  multi-index speech enhancement algorithms, we proposed an adaptive speech enhancement algorithm based on a multi-head attention mechanism. Firstly, by  introducing a multi-head attention structure into the intermediate layer of the discriminator network, we  enhanced the joint modeling ability of the  model for local features and overall structure of speech spectrum, and  combined it with an online knowledge distillation strategy to achieve information sharing among multiple generators, thereby improving the collaborative optimization effect under multi-index conditions. Secondly, in order to reduce the impact of outliers on the training process, we replaced the loss function with a logarithmic mean-squared error form to improve  stability and robustness of the model. Experimental results on the publicly available speech dataset VoiceBank-DEMAND show that this method outperforms existing multi-index speech enhancement models in terms of speech quality, background noise suppression, and speech intelligibility metrics. Therefore,  introducing an attention mechanism and a stabilizing loss function can significantly improve the overall performance of multi-index speech enhancement algorithms.

Key words: speech enhancement, frequency domain, multi-head attention mechanism, online knowledge distillation, logarithmic mean-squared error loss

中图分类号: 

  • TP391