帧间差分相位谱帧长和帧移的最优设置方法

吉林大学学报(工学版) ›› 2013, Vol. 43 ›› Issue (增刊1): 6-10.

帧间差分相位谱帧长和帧移的最优设置方法

王金芳, 虢明, 聂新礼

吉林大学通信工程学院,长春 130012

收稿日期:2012-05-25 发布日期:2013-06-01
作者简介:王金芳(1969-),男,副教授.研究方向:语音信号处理.E-mail:jinfangw@163.com

Optimal setting method for frame length and frame shift of interframe difference phase spectrum

WANG Jin-fang, GUO Ming, NIE Xin-li

College of Communication Engineering, Jilin University, Changchun 130012, China

Received:2012-05-25 Published:2013-06-01

摘要/Abstract

摘要：

多数说话人识别算法涉及的短时傅立叶变换相位部分,因对其函数表达式的数学处理存在问题,往往被忽略。帧间差分相位谱的特征提取方法着眼于折中考虑相位突变检测能力和相位噪声两方面要求,但仅靠经验分析确定帧长和帧移缺乏理论依据,因此本文提出利用互信息准则对其进行最优设置。基于TIMIT语料库的说话人识别测试实验表明,这种最优帧长和帧移下的识别性能明显优于经验分析所得参数的结果。

关键词: 说话人识别, 帧间差分相位谱, 互信息准则

Abstract:

The phase part of short-time Fourier transform involved in most speaker recognition algorithms was often ignored because of the difficulties in processing the function expressions.The feature extraction procedure of interframe difference phase spectrum sticks to making a reasonable compromise between the two factors of the detection ability of sudden phase changes and phase warping,but the determination of frame length and frame shift based on empirical analysis lacks the rational basis.Thus the optimal setting method using mutual information criterion was proposed.The speaker recognition test experiments on TIMIT speech database show that the performance on the condition of the optimized frame length and frame shift is superior to that with empirical analysis.

Key words: speaker recognition, interframe difference phase spectrum, mutual information criterion

中图分类号:

TN912

王金芳, 虢明, 聂新礼. 帧间差分相位谱帧长和帧移的最优设置方法[J]. 吉林大学学报(工学版), 2013, 43(增刊1): 6-10.

WANG Jin-fang, GUO Ming, NIE Xin-li. Optimal setting method for frame length and frame shift of interframe difference phase spectrum[J]. 吉林大学学报(工学版), 2013, 43(增刊1): 6-10.

参考文献

[1] Al-Nashi H.Phase unwrapping of digital signals[J].IEEE Transactions on Acoustics,Speech and Signal Processing,1989,37(11):1693-1702.

[2] Murthy H A, Madhu Murthy K V, Yegnanarayana B.Formant extraction from phase using weighted group delay function[J].Electronics Letters,1989,25(23):1609-1611.

[3] Yegnanarayana B,Murthy H A.Significance of group delay functions in spectrum estimation[J].IEEE Transactions on Signal Processing,1992,40(9):2281-2289.

[4] Alsteris L D,Paliwal K K.Further intelligibility results from human listening tests using the short-time phase spectrum[J].Speech Communication,2006,48(6):727-736.

[5] Liu L,He J,Palm G.Effects of phase on the perception of intervocalic stop consonants[J].Speech Communication,1997,22(4):403-417.

[6] Oppenheim A V,Lim J S.The importance of phase in signals[J].Proceedings of the IEEE,1981,69(5):529-541.

[7] Schroeder M R.Models of hearing[J].Proceedings of the IEEE,1975,63(9):1332-1350.

[8] Alsteris L D,Paliwal K K.Short-time phase spectrum in speech processing:A review and some experimental results[J].Digital Signal Processing,2007,17(3):578-616.

[9] Reddy N, Swamy M.Derivative of phase spectrum of truncated autoregressive signals[J].IEEE Transactions on Circuits and Systems,1985,32(6):616-618.

[10] Alsteris L D,Paliwal K K.Importance of window shape for phase-only reconstruction of speech[C]// in Proc.IEEE International Conference on Acoustics,Speech,and Signal Processing (ICASSP'04),Montreal,Quebec,Canada,2004:573-576.

[11] Wang Y,Hansen J, Allu G K,et al.Average instantaneous frequency (AIF) and average log-envelopes (ALE) for ASR with the Aurora 2 database[C]// in Proc.Interspeech 2003,Geneva,Switzerland,2003:25-28.

[12] Stark A P,Paliwal K K.Speech analysis using instantaneous frequency deviation[C]//in Proc.Interspeech 2008,Brisbance,Australia,2008:2602-2605.

[13] Murthy H A,Gadde V.The modified group delay function and its application to phoneme recognition[C]//in Proc.IEEE International Conference on Acoustics,Speech,and Signal Processing (ICASSP'03),Hong Kong,China,2003:68-71.

[14] McCowan I, Dean D, McLaren M,et al.The delta-phase spectrum with application to voice activity detection and speaker recognition[J].IEEE Transactions on Audio,Speech,and Language Processing,2011,19(7):2026-2038.

[15] McEliece R J.信息论与编码理论(第二版)[M]:北京:电子工业出版社,2003.

[16] Eriksson T,Kim S, Hong-Goo K,et al.An information-theoretic perspective on feature selection in speaker recognition[J].IEEE Signal Processing Letters,2005,12(7):500-503.

[17] Rajan P, Hegde R M, Murthy H A.Dynamic selection of magnitude and phase based acoustic feature streams for speaker verification [C]//in Proc.European Signal Process.Conf.,Glasgow,Scotland,2009:1244-1248.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed