潜语义分析语言模型及其在汉语大词汇连续语音识别中的应用

潜语义分析语言模型及其在汉语大词汇连续语音识别中的应用

吴玺宏, 吴昊, 高勤, 林小俊, 王馨浩

北京大学信息科学技术学院视觉与听觉国家重点实验室, 北京 100871

收稿日期:2006-05-28 修回日期:1900-01-01 出版日期:2006-08-26 发布日期:2006-08-26
通讯作者: 吴玺宏

Latent Semantic Analysis Language Model and Its Application in Chinese Large Vocabulary Continuous Speech Recognition

WU Xi hong, WU Hao, GAO Qin, LIN Xiao jun, WANG Xin hao

National Laboratory on Machine Perception, College of Information Science and Technology, Peking University, Beijing 100871, China

Received:2006-05-28 Revised:1900-01-01 Online:2006-08-26 Published:2006-08-26
Contact: WU Xi hong

摘要/Abstract

摘要： 分析了潜语义分析语言模型在建模和解码过程中的主要问题, 实现了潜语义分析语言模型的建模, 并提出一种在连续语音识别系统一遍解码框架中融合的方法. 实验结果表明, 该方法可有效地提高大词汇汉语连续语音识别系统的性能. 

关键词: 语言模型, 向量空间模型, 潜语义分析, 语音识别

Abstract: Integrating high level semantic knowledge in speech recognition has been a hot topic in the field. Latent semantic analysis (LSA) technology can model long span correlation of words in the language efficiently. How to utilize LSA in language modeling and guide searching procedure more accurately is an urgent task. We analyzed major problems on the modeling and decoding procedure of LSA language model, realized modeling of LSA language model, and proposed a method to use LSA language model in the first-pass decoding of continuous speech recognition system. Experimental result shows that the proposed method can improve the performance of large vocabulary continuous speech recognition of Chinese language significantly.

Key words: language model, vector space model, latent semantic an alysis, speech recognition

中图分类号:

TP391

吴玺宏, 吴昊, 高勤, 林小俊, 王馨浩. 潜语义分析语言模型及其在汉语大词汇连续语音识别中的应用[J]. J4, 2006, 44(06): 16-20.

WU Xi hong, WU Hao, GAO Qin, LIN Xiao jun, WANG Xin hao. Latent Semantic Analysis Language Model and Its Application in Chinese Large Vocabulary Continuous Speech Recognition[J]. J4, 2006, 44(06): 16-20.

[1]	刘妍秀, 孙一鸣, 杨华民. 基于归一化算法的噪音鲁棒性连续语音识别[J]. 吉林大学学报(理学版), 2015, 53(03): 519-524.
[2]	崔明, 许志闻. 基于加权DFFD算法和渐变动画思想的人脸动画系统[J]. J4, 2012, 50(02): 288-292.
[3]	徐沛娟, 李雄飞, 惠玥, 张桂林. 中文文本分类相关算法的研究与实现[J]. J4, 2009, 47(4): 790-794.
[4]	汪鹏，刘加，刘润生. 基于离散HMM的非特定人关键词提取语音识别系统[J]. J4, 2003, 41(03): 347-351.