J4

• 计算机科学 • 上一篇    下一篇

潜语义分析语言模型及其在汉语大词汇连续语音识别中的应用

吴玺宏, 吴昊, 高勤, 林小俊, 王馨浩   

  1. 北京大学 信息科学技术学院视觉与听觉国家重点实验室, 北京 100871
  • 收稿日期:2006-05-28 修回日期:1900-01-01 出版日期:2006-08-26 发布日期:2006-08-26
  • 通讯作者: 吴玺宏

Latent Semantic Analysis Language Model and Its Application in Chinese Large Vocabulary Continuous Speech Recognition

WU Xi hong, WU Hao, GAO Qin, LIN Xiao jun, WANG Xin hao   

  1. National Laboratory on Machine Perception, College of Information Science and Technology, Peking University, Beijing 100871, China
  • Received:2006-05-28 Revised:1900-01-01 Online:2006-08-26 Published:2006-08-26
  • Contact: WU Xi hong

摘要: 分析了潜语义分析语言模型在建模和解码过程中的主要问题, 实现了潜语义分析语言模型的建模, 并提出一种在连续语音识别系统一遍解码框架中融合的方法. 实验结果表明, 该方法可有效地提高大词汇汉语连续语音识别系统的性能. 

关键词: 语言模型, 向量空间模型, 潜语义分析, 语音识别

Abstract: Integrating high level semantic knowledge in speech recognition has been a hot topic in the field. Latent semantic analysis (LSA) technology can model long span correlation of words in the language efficiently. How to utilize LSA in language modeling and guide searching procedure more accurately is an urgent task. We analyzed major problems on the modeling and decoding procedure of LSA language model, realized modeling of LSA language model, and proposed a method to use LSA language model in the first-pass decoding of continuous speech recognition system. Experimental result shows that the proposed method can improve the performance of large vocabulary continuous speech recognition of Chinese language significantly.

Key words: language model, vector space model, latent semantic an alysis, speech recognition

中图分类号: 

  • TP391