基于改进的多度量优化自适应语音增强算法

吉林大学学报(理学版) ›› 2026, Vol. 64 ›› Issue (2): 403-0410.

基于改进的多度量优化自适应语音增强算法

付春雨, 刘均

东北石油大学物理与电子工程学院, 黑龙江大庆 163318

收稿日期:2024-10-15 出版日期:2026-03-26 发布日期:2026-03-26
通讯作者: 刘均 E-mail:liujun@nepu.edu.cn

Adaptive Speech Enhancement Algorithm Based on Improved Multi-metric Optimization

FU Chunyu, LIU Jun

School of Physics and Electronic Engineering, Northeast Petroleum University, Daqing 163318, Heilongjiang Province, China

Received:2024-10-15 Online:2026-03-26 Published:2026-03-26

摘要/Abstract

摘要： 针对多指标语音增强算法在训练过程中易受异常值干扰、导致优化不稳定的问题, 提出一种基于多头注意力机制的自适应语音增强算法. 首先, 通过在判别网络中间层引入多头注意力结构, 增强模型对语音频谱局部特征和整体结构的联合建模能力, 并结合在线知识蒸馏策略实现多生成器之间的信息共享, 从而提升多指标条件下的协同优化效果. 其次, 为减少异常值对训练过程的影响, 将损失函数替换为对数均方误差形式, 以提升模型的稳定性和鲁棒性. 在公开语音数据集VoiceBank-DEMAND上的实验结果表明, 该方法在语音质
量、背景噪声抑制和语音可懂度指标上均优于现有多指标语音增强模型. 因此, 引入注意力机制与稳定化损失函数能显著改善多指标语音增强算法的综合性能.

关键词: 语音增强, 频域, 多头注意力机制, 在线知识蒸馏, 对数均方误差损失

Abstract: Aiming at the problem of susceptibility to outlier interference and unstable optimization during training process of multi-index speech enhancement algorithms, we proposed an adaptive speech enhancement algorithm based on a multi-head attention mechanism. Firstly, by introducing a multi-head attention structure into the intermediate layer of the discriminator network, we enhanced the joint modeling ability of the model for local features and overall structure of speech spectrum, and combined it with an online knowledge distillation strategy to achieve information sharing among multiple generators, thereby improving the collaborative optimization effect under multi-index conditions. Secondly, in order to reduce the impact of outliers on the training process, we replaced the loss function with a logarithmic mean-squared error form to improve stability and robustness of the model. Experimental results on the publicly available speech dataset VoiceBank-DEMAND show that this method outperforms existing multi-index speech enhancement models in terms of speech quality, background noise suppression, and speech intelligibility metrics. Therefore, introducing an attention mechanism and a stabilizing loss function can significantly improve the overall performance of multi-index speech enhancement algorithms.

Key words: speech enhancement, frequency domain, multi-head attention mechanism, online knowledge distillation, logarithmic mean-squared error loss

中图分类号:

TP391

付春雨, 刘均. 基于改进的多度量优化自适应语音增强算法[J]. 吉林大学学报(理学版), 2026, 64(2): 403-0410.

FU Chunyu, LIU Jun. Adaptive Speech Enhancement Algorithm Based on Improved Multi-metric Optimization[J]. Journal of Jilin University Science Edition, 2026, 64(2): 403-0410.

[1]	李柯, 刘云清, 李棋, 颜飞, 张琼. 多头注意力结合时间卷积的情绪识别方法[J]. 吉林大学学报(理学版), 2025, 63(5): 1366-1378.
[2]	张礼艳, 刘增力, 彭艺. 基于改进小波阈值和优化VMD算法的语音增强方法[J]. 吉林大学学报(理学版), 2025, 63(2): 608-0621.
[3]	高新成, 张宣, 樊本航, 刘威, 张海洋. 基于改进的CNN-Transformer加密流量分类方法[J]. 吉林大学学报(理学版), 2024, 62(3): 683-690.
[4]	王宏志, 孙树宇, 何斌, 姚亮. 基于高阶统计量的MIMO系统辨识与均衡算法[J]. J4, 2009, 47(01): 115-119.