基于改进GFCC特征参数的广播音频语种识别

吉林大学学报(理学版) ›› 2022, Vol. 60 ›› Issue (2): 417-424.

基于改进GFCC特征参数的广播音频语种识别

邵玉斌, 陈亮, 龙华, 杜庆治

昆明理工大学信息工程与自动化学院, 昆明 650500

收稿日期:2020-11-19 出版日期:2022-03-26 发布日期:2022-03-26
通讯作者: 邵玉斌 E-mail:shaoyubin@kust.edu.cn

Broadcast Audio Language Identification Based on Improved GFCC Feature Parameters

SHAO Yubin, CHEN Liang, LONG Hua, DU Qingzhi

School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China

Received:2020-11-19 Online:2022-03-26 Published:2022-03-26

摘要/Abstract

摘要： 针对广播音频语种识别中与语种识别无关的特征对识别结果产生影响的问题, 提出一种基于伽马频率倒谱系数的改进特征参数的语种识别方法. 通过提取每帧信号的能量谱包络, 去除部分与说话人相关的特征, 采用Gammatone滤波器组滤波，经离散余弦变换后再进行倒谱提升, 得到改进的伽马频率倒谱系数特征参数. 将广播音频信号提取特征参数输入隐Markov模型中进行训练测试, 得到的语种识别结果表明, 该方法有效提升了广播音频语种识别的准确率, 优于目前使用的伽马频率倒谱系数特征及其衍生方法.

关键词: 广播音频语种识别, 能量谱包络, 倒谱提升, 改进伽马频率倒谱系数

Abstract: To address the problem that features unrelated to language identification in broadcast audio have an impact on the language
identification results, an improved language identification method based on gamma frequency cepstrum coefficients with improved feature parameters is proposed. By extracting the energy spectral envelope of each frame, the speaker-related features are removed, filtered by a Gammatone filter banks, and then by the discrete cosine transform and cepstrum lifting to obtain the improved gamma frequency cepstrum feature parameters. The feature parameters extracted from broadcast audio signal were input into hidden Markov model for training and testing， and the language identification results were obtained. The results show that the proposed method can effectively improve the language identification accuracy for broadcast audio, which is better than the currently used gamma frequency cepstrum coefficient features and their derivatives.

Key words: broadcast audio language identificaition, energy spectrum envelope, cepstrum lifting, improved gamma frequency cepstrum coefficient

中图分类号:

TP391

邵玉斌, 陈亮, 龙华, 杜庆治. 基于改进GFCC特征参数的广播音频语种识别[J]. 吉林大学学报(理学版), 2022, 60(2): 417-424.

SHAO Yubin, CHEN Liang, LONG Hua, DU Qingzhi. Broadcast Audio Language Identification Based on Improved GFCC Feature Parameters[J]. Journal of Jilin University Science Edition, 2022, 60(2): 417-424.

[1]	赵鹏程, 高尚, 于洪梅. 基于多智能体深度强化学习的空间众包任务分配[J]. 吉林大学学报(理学版), 2022, 60(2): 321-331.
[2]	王喆, 李鑫. 基于邻域信息的网络结构表示学习[J]. 吉林大学学报(理学版), 2022, 60(2): 343-350.
[3]	任伟建, 刘泽宇, 霍凤财, 康朝海, 任璐, 张永丰. 一种改进的多光谱遥感图像超像素分割算法[J]. 吉林大学学报(理学版), 2022, 60(2): 351-360.
[4]	杨亚男, 朱晓冬, 刘元宁, 朱琳, 董霖. 基于改进YoloV4网络的虹膜定位算法[J]. 吉林大学学报(理学版), 2022, 60(2): 369-380.
[5]	蔡旭航, 朱留存, 张震, 张恒艳, 郑晓东. 基于多尺度超像素融合的RGB-D单幅图像阴影检测算法[J]. 吉林大学学报(理学版), 2022, 60(2): 392-400.
[6]	齐妙, 闫光友, 徐慧, 孙慧. 基于多尺度特征选择网络的人脸表情识别[J]. 吉林大学学报(理学版), 2022, 60(2): 425-431.
[7]	吴祖慷, 朱晓冬, 刘元宁, 王超群, 周智勇. 基于GA-SVM模型的虹膜质量评估方法[J]. 吉林大学学报(理学版), 2022, 60(1): 89-0098.
[8]	吴祖慷, 朱晓冬, 刘元宁, 王超群, 周智勇. 基于GA-SVM模型的虹膜质量评估方法[J]. 吉林大学学报(理学版), 0, (): 89-0098.
[9]	徐华平, 贾小宁. 基于仿射不变块相似度量的BM3D图像去噪算法[J]. 吉林大学学报(理学版), 2022, 60(1): 109-0118.
[10]	张天杭, 李婷婷, 张永刚. 基于知识图谱嵌入的多跳中文知识问答方法[J]. 吉林大学学报(理学版), 2022, 60(1): 119-0126.
[11]	隋振, 张天星, 吴涛, 陈华锐. 基于多种群空间映射遗传算法的立体仓库储位优化[J]. 吉林大学学报(理学版), 2022, 60(1): 127-0134.
[12]	刘均, 宫子栋, 吴力. 基于信息熵度量的局部线性嵌入算法[J]. 吉林大学学报(理学版), 2022, 60(1): 143-0149.
[13]	刘军, 杨军, 宋姗姗. 基于用户购买意愿力的协同过滤推荐算法[J]. 吉林大学学报(理学版), 2021, 59(6): 1432-1438.
[14]	姜博, 左万利, 王英. 基于BERT的因果关系抽取[J]. 吉林大学学报(理学版), 2021, 59(6): 1439-1444.
[15]	李长明, 张红臣, 王超, 李晓光, 陆洋, 钱超越. 一种高效的阴阳k-Means聚类算法[J]. 吉林大学学报(理学版), 2021, 59(6): 1455-1460.