吉林大学学报(理学版) ›› 2022, Vol. 60 ›› Issue (2): 417-424.

• • 上一篇    下一篇

 基于改进GFCC特征参数的广播音频语种识别

邵玉斌, 陈亮, 龙华, 杜庆治   

  1. 昆明理工大学 信息工程与自动化学院, 昆明 650500
  • 收稿日期:2020-11-19 出版日期:2022-03-26 发布日期:2022-03-26
  • 通讯作者: 邵玉斌 E-mail:shaoyubin@kust.edu.cn

Broadcast Audio Language Identification Based on Improved GFCC Feature Parameters

SHAO Yubin, CHEN Liang, LONG Hua, DU Qingzhi   

  1. School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
  • Received:2020-11-19 Online:2022-03-26 Published:2022-03-26

摘要: 针对广播音频语种识别中与语种识别无关的特征对识别结果产生影响的问题, 提出一种基于伽马频率倒谱系数的改进特征参数的语种识别方法. 通过提取每帧信号的能量谱包络, 去除部分与说话人相关的特征, 采用Gammatone滤波器组滤波, 经离散余弦变换后再进行倒谱提升, 得到改进的伽马频率倒谱系数特征参数. 将广播音频信号提取特征参数输入隐Markov模型中进行训练测试, 得到的语种识别结果表明, 该方法有效提升了广播音频语种识别的准确率, 优于目前使用的伽马频率倒谱系数特征及其衍生方法.

关键词: 广播音频语种识别, 能量谱包络, 倒谱提升, 改进伽马频率倒谱系数

Abstract: To address the problem that features unrelated to language identification in broadcast audio have an impact on the language 
identification results, an improved language identification method based on gamma frequency cepstrum coefficients with improved feature parameters is proposed. By extracting the energy spectral envelope of each frame, the speaker-related features are removed, filtered by a Gammatone filter banks, and then by the discrete cosine transform and cepstrum lifting to obtain the improved gamma frequency cepstrum feature parameters. The feature parameters extracted from broadcast audio signal were input into hidden Markov model for training and testing, and the language identification results were obtained. The results show that the proposed method can effectively improve the language identification accuracy for broadcast audio, which is better than the currently used gamma frequency cepstrum coefficient features and their derivatives.

Key words: broadcast audio language identificaition, energy spectrum envelope, cepstrum lifting, improved gamma frequency cepstrum coefficient

中图分类号: 

  • TP391