吉林大学学报(工学版) ›› 2013, Vol. 43 ›› Issue (增刊1): 6-10.
王金芳, 虢明, 聂新礼
WANG Jin-fang, GUO Ming, NIE Xin-li
摘要:
多数说话人识别算法涉及的短时傅立叶变换相位部分,因对其函数表达式的数学处理存在问题,往往被忽略。帧间差分相位谱的特征提取方法着眼于折中考虑相位突变检测能力和相位噪声两方面要求,但仅靠经验分析确定帧长和帧移缺乏理论依据,因此本文提出利用互信息准则对其进行最优设置。基于TIMIT语料库的说话人识别测试实验表明,这种最优帧长和帧移下的识别性能明显优于经验分析所得参数的结果。
中图分类号:
| [1] Al-Nashi H.Phase unwrapping of digital signals[J].IEEE Transactions on Acoustics,Speech and Signal Processing,1989,37(11):1693-1702.[2] Murthy H A, Madhu Murthy K V, Yegnanarayana B.Formant extraction from phase using weighted group delay function[J].Electronics Letters,1989,25(23):1609-1611.[3] Yegnanarayana B,Murthy H A.Significance of group delay functions in spectrum estimation[J].IEEE Transactions on Signal Processing,1992,40(9):2281-2289.[4] Alsteris L D,Paliwal K K.Further intelligibility results from human listening tests using the short-time phase spectrum[J].Speech Communication,2006,48(6):727-736.[5] Liu L,He J,Palm G.Effects of phase on the perception of intervocalic stop consonants[J].Speech Communication,1997,22(4):403-417.[6] Oppenheim A V,Lim J S.The importance of phase in signals[J].Proceedings of the IEEE,1981,69(5):529-541.[7] Schroeder M R.Models of hearing[J].Proceedings of the IEEE,1975,63(9):1332-1350.[8] Alsteris L D,Paliwal K K.Short-time phase spectrum in speech processing:A review and some experimental results[J].Digital Signal Processing,2007,17(3):578-616.[9] Reddy N, Swamy M.Derivative of phase spectrum of truncated autoregressive signals[J].IEEE Transactions on Circuits and Systems,1985,32(6):616-618.[10] Alsteris L D,Paliwal K K.Importance of window shape for phase-only reconstruction of speech[C]// in Proc.IEEE International Conference on Acoustics,Speech,and Signal Processing (ICASSP'04),Montreal,Quebec,Canada,2004:573-576.[11] Wang Y,Hansen J, Allu G K,et al.Average instantaneous frequency (AIF) and average log-envelopes (ALE) for ASR with the Aurora 2 database[C]// in Proc.Interspeech 2003,Geneva,Switzerland,2003:25-28.[12] Stark A P,Paliwal K K.Speech analysis using instantaneous frequency deviation[C]//in Proc.Interspeech 2008,Brisbance,Australia,2008:2602-2605.[13] Murthy H A,Gadde V.The modified group delay function and its application to phoneme recognition[C]//in Proc.IEEE International Conference on Acoustics,Speech,and Signal Processing (ICASSP'03),Hong Kong,China,2003:68-71.[14] McCowan I, Dean D, McLaren M,et al.The delta-phase spectrum with application to voice activity detection and speaker recognition[J].IEEE Transactions on Audio,Speech,and Language Processing,2011,19(7):2026-2038.[15] McEliece R J.信息论与编码理论(第二版)[M]:北京:电子工业出版社,2003.[16] Eriksson T,Kim S, Hong-Goo K,et al.An information-theoretic perspective on feature selection in speaker recognition[J].IEEE Signal Processing Letters,2005,12(7):500-503.[17] Rajan P, Hegde R M, Murthy H A.Dynamic selection of magnitude and phase based acoustic feature streams for speaker verification [C]//in Proc.European Signal Process.Conf.,Glasgow,Scotland,2009:1244-1248. |
| [1] | 申铉京, 翟玉杰, 卢禹彤, 王玉, 陈海鹏. 基于信道补偿的说话人识别算法[J]. 吉林大学学报(工学版), 2016, 46(3): 870-875. |
| [2] | 林琳, 陈虹, 陈建, 金焕梅. 基于多核SVM-GMM的短语音说话人识别[J]. 吉林大学学报(工学版), 2013, 43(02): 504-509. |
| [3] | 林琳;王树勋;魏小丽 . 基于遗传模糊高斯混合模型的训练方法[J]. 吉林大学学报(工学版), 2006, 36(06): 967-0972. |
|
||