J4

• • 上一篇    下一篇

基于统计规则的交集型歧义处理方法

翟凤文, 赫枫龄, 左万利   

  1. (吉林大学 软件学院, 长春 130012)
  • 收稿日期:2005-06-20 修回日期:1900-01-01 出版日期:2006-03-26 发布日期:2006-03-26
  • 通讯作者: 左万利

Crossing Ambiguity Segmentation Based on Statistical Rules

ZHAI Feng-wen, HE Feng-ling, ZUO Wan-li   

  1. (College of Software, Jilin University, Changchun 130012, China)
  • Received:2005-06-20 Revised:1900-01-01 Online:2006-03-26 Published:2006-03-26
  • Contact: ZUO Wan-li

摘要: 中文分词是中文信息处理的基础, 歧义问题是中文分词的一个难点, 而交集型歧义问题占歧义问题的90%以上, 因此对交集型歧义问题的研究是中文分词研究的一个重点. 通过反复的实验和对实验结果的分析, 提出了5条规则, 并根据这5条规则给出了一种针对交集型歧义字段切分的算法, 实验结果表明, 基于该算法实现的分词系统DSfenci, 对于交集型歧义消解的准确率高于95.22%.

关键词: 交集型歧义, 规则, 统计

Abstract: Chinese word segmentation is a base for Chinese Information Processing, and the ambiguity problem is a nodus of Chinese word segmentati on and more then 90% of ambiguity problems are crossing ambiguity, so the solution of the crossing ambiguity problem is an important part of Chinese word segmentation. After repeated experiments and analyses, 5 rules and an algorithm based on these 5 rules were proposed to segment crossing ambiguity. From experiment results, it can be found that the accuracy of DSfenci system we developed based on these 5 rules reaches to 95.22%, which is an excellent experiment result.

Key words: crossing ambiguity, rules, statistics

中图分类号: 

  • TP391.12