吉林大学学报(理学版)

• 计算机科学 • 上一篇    下一篇

构建单字词表识别未登录词的方法

于童, 刘淑芬   

  1. 吉林大学 计算机科学与技术学院, 长春 130012
  • 收稿日期:2014-07-11 出版日期:2015-03-26 发布日期:2015-03-24
  • 通讯作者: 刘淑芬 E-mail:liusf@jlu.edu.cn

Method of Recognizing Unknown Words by Building SingleWord Dictionary

YU Tong, LIU Shufen   

  1. College of Computer Science and Technology, Jilin University, Changchun 130012, China
  • Received:2014-07-11 Online:2015-03-26 Published:2015-03-24
  • Contact: LIU Shufen E-mail:liusf@jlu.edu.cn

摘要:

针对目前中文分词技术主要依赖于常用词词典, 而词典对未登录词识别率较低的问题, 提出一种用双词典识别未登录词的方法, 即构建一个常用词词典和一个单字词词典, 二者相互结合进行分词, 有效解决了对未登录词识别效率偏低的问题. 实验表明, 采用构建单字词表法对未登录词的识别准确率可达90%以上.

关键词: 单字词表, 未登录词, 中文分词, 双词典法

Abstract:

Chinese word segmentation is a very important task in information processing. The present Chinese word segmentation technology mainly relies on commonword dictionary. But the dictionary has no recognition capability for unknown words. The authors brought forth a method of using doubledictionary to recognize unknown words. The process is to build a commonword dictionary and a singleword dictionary, then combine  them for  segmentation, solving the inefficiency in recognizing unknown words. As a result, the accuracy rate can reach above 90%.

Key words: singleword dictionary, unknown words, Chinese word segmentation, doubledictionary

中图分类号: 

  • TP391.12