基于联合知识迁移的低资源语音关键词检测

吉林大学学报(理学版) ›› 2026, Vol. 64 ›› Issue (2): 394-0402.

基于联合知识迁移的低资源语音关键词检测

黄金鑫¹, 贺前华¹, 郑若伟¹, 杨茗茹¹, 王文武²

1. 华南理工大学电子与信息学院, 广州 510641； 2. 萨里大学视觉、语音和信号处理中心, 英国吉尔福德 GU2 7XH

收稿日期:2024-08-14 出版日期:2026-03-26 发布日期:2026-03-26
通讯作者: 贺前华 E-mail:eeqhhe@scut.edu.cn

Low-Resource Speech Keyword Spotting Based on Joint Knowledge Transfer

HUANG Jinxin¹, HE Qianhua¹, ZHENG Ruowei¹, YANG Mingru¹, WANG Wenwu²

1. School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510641, China；
2. Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, UK

Received:2024-08-14 Online:2026-03-26 Published:2026-03-26

摘要/Abstract

摘要： 针对低资源条件下语音关键词检测准确率较低的问题, 提出一种联合无监督特征提取与有监督模型参数迁移的检测方法. 首先, 利用大规模无标注语音数据训练深度特征提取网络, 并将提取的特征与声学谱图特征进行融合, 以增强特征对声学环境的鲁棒性；其次, 利用源域丰富的有标注数据对判决网络进行预训练, 通过参数迁移的方式引入判决知识, 解决目标域训练数据不足导致的模型难收敛问题；最后, 使用极少量目标域数据对整体网络进行微调. 在客家话及粤语数据集上的实验结果表明, 该方法显著优于单一迁移策略, 在客家话任务中错误拒绝率降至11.77%, 加权关键词最大值提升至0.734 6. 实验结果证明该方法能有效缓解数据匮乏问题, 显著提升低资源语种的检测性能.

关键词: 语音关键词检测, 深度学习, 低资源, 联合知识迁移

Abstract: Aiming at the problem of the low accuracy of speech keyword spotting under low-resource conditions, we proposed a detection method combining unsupervised feature extraction and supervised model parameter transfer. Firstly, a deep feature extraction network was trained by using large-scale unlabeled speech data, and the extracted features were fused with acoustic spectrogram
features to enhance robustness of the features to acoustic environments. Secondly, the decision network was pre-trained by using rich labeled data from the source domain, and decision knowledge was introduced through parameter transfer to solve the problem of model convergence difficulty caused by insufficient training data in the target domain. Finally, the entire network was fine-tuned by using a very small amount of target domain data. Experimental results on Hakka and Cantonese datasets show that this method significantly outperforms single transfer strategies. In the Hakka task, the false rejection rate is reduced to 11.77%, and the maximum term weighted value is improved to 0.734 6. The experimental results demonstrate that the proposed method can effectively alleviate the problem of data scarcity and significantly improve detection performance for low-resource languages.

Key words: speech keyword spotting, deep learning, low-resource, joint knowledge transfer

中图分类号:

TP391

黄金鑫, 贺前华, 郑若伟, 杨茗茹, 王文武. 基于联合知识迁移的低资源语音关键词检测[J]. 吉林大学学报(理学版), 2026, 64(2): 394-0402.

HUANG Jinxin, HE Qianhua, ZHENG Ruowei, YANG Mingru, WANG Wenwu. Low-Resource Speech Keyword Spotting Based on Joint Knowledge Transfer[J]. Journal of Jilin University Science Edition, 2026, 64(2): 394-0402.

[1]	任勇, 朵琳. 基于特征分离和全局上下文的红外小目标检测方法[J]. 吉林大学学报(理学版), 2025, 63(5): 1437-1446.
[2]	李洪亮, 张蒙, 王子琛, 李想. 面向数据并行深度学习的准确率感知稀疏梯度融合算法[J]. 吉林大学学报(理学版), 2025, 63(5): 1356-1365.
[3]	邵剑飞, 蔡世军, 刘杰. YOLO-LDD：轻量级无人机检测算法[J]. 吉林大学学报(理学版), 2025, 63(3): 867-0877.
[4]	张燕. 基于深度学习与D-S理论的多模态数据特征融合算法[J]. 吉林大学学报(理学版), 2025, 63(3): 855-0860.
[5]	李健, 王海瑞, 王增辉, 付海涛, 于维霖. 改进SHO优化神经网络模型[J]. 吉林大学学报(理学版), 2025, 63(3): 835-0844.
[6]	李绍轩, 杨有龙. 基于消融分析的卷积神经网络可解释性分析[J]. 吉林大学学报(理学版), 2024, 62(3): 606-614.
[7]	江晟, 张仲义, 汪宗洋, 于晴. 基于改进YOLOv7的交通路口目标识别算法[J]. 吉林大学学报(理学版), 2024, 62(3): 665-673.
[8]	孙旭菲, 缪新颖, 毕甜甜, 王水涛, 喻芳宇. SFSR-Age: 一种基于人脸强语义的年龄识别算法[J]. 吉林大学学报(理学版), 2024, 62(2): 347-0356.
[9]	侯广哲, 秦贵和, 梁艳花. 基于下采样的自监督点云去噪方法[J]. 吉林大学学报(理学版), 2024, 62(1): 100-0105.
[10]	朱淑畅, 李文辉. 一种基于自注意力信息补偿的服装分类算法[J]. 吉林大学学报(理学版), 2023, 61(6): 1419-1424.
[11]	季鑫缘, 董建涛, 陶浩. 基于神经随机微分方程的期权定价[J]. 吉林大学学报(理学版), 2023, 61(6): 1324-1332.
[12]	牛增贤, 刘海峰, 徐伟峰, 李刚, 谢庆, 王洪涛. 基于扩展Span表示的电力变压器运维知识抽取与知识图谱构建[J]. 吉林大学学报(理学版), 2023, 61(5): 1112-1122.
[13]	李文举, 李文辉. 基于压缩表示的实例分割方法[J]. 吉林大学学报(理学版), 2023, 61(4): 883-889.
[14]	姚博, 王卫卫. 基于异构融合和判别损失的图嵌入聚类[J]. 吉林大学学报(理学版), 2023, 61(4): 853-862.
[15]	李肃义, 张欣雨, 杨强, 张熠, 刁庶. 一种MCSEM数据噪声压制方法[J]. 吉林大学学报(理学版), 2023, 61(4): 929-936.