构建单字词表识别未登录词的方法

吉林大学学报(理学版)

构建单字词表识别未登录词的方法

于童, 刘淑芬

吉林大学计算机科学与技术学院, 长春 130012

收稿日期:2014-07-11 出版日期:2015-03-26 发布日期:2015-03-24
通讯作者: 刘淑芬 E-mail:liusf@jlu.edu.cn

Method of Recognizing Unknown Words by Building SingleWord Dictionary

YU Tong, LIU Shufen

College of Computer Science and Technology, Jilin University, Changchun 130012, China

Received:2014-07-11 Online:2015-03-26 Published:2015-03-24
Contact: LIU Shufen E-mail:liusf@jlu.edu.cn

摘要/Abstract

摘要：

针对目前中文分词技术主要依赖于常用词词典, 而词典对未登录词识别率较低的问题, 提出一种用双词典识别未登录词的方法, 即构建一个常用词词典和一个单字词词典, 二者相互结合进行分词, 有效解决了对未登录词识别效率偏低的问题. 实验表明, 采用构建单字词表法对未登录词的识别准确率可达90%以上.

关键词: 单字词表, 未登录词, 中文分词, 双词典法

Abstract:

Chinese word segmentation is a very important task in information processing. The present Chinese word segmentation technology mainly relies on commonword dictionary. But the dictionary has no recognition capability for unknown words. The authors brought forth a method of using doubledictionary to recognize unknown words. The process is to build a commonword dictionary and a singleword dictionary, then combine them for segmentation, solving the inefficiency in recognizing unknown words. As a result, the accuracy rate can reach above 90%.

Key words: singleword dictionary, unknown words, Chinese word segmentation, doubledictionary

中图分类号:

TP391.12

于童, 刘淑芬. 构建单字词表识别未登录词的方法[J]. 吉林大学学报(理学版), 2015, 53(02): 307-310.

YU Tong, LIU Shufen. Method of Recognizing Unknown Words by Building SingleWord Dictionary[J]. Journal of Jilin University Science Edition, 2015, 53(02): 307-310.