J4

• 计算机科学 • Previous Articles     Next Articles

An automatic and dictionary-free Chinese wordsegmentation method based on suffix array

ZHANG Chang-li, HE Feng-ling, ZUO Wan-li   

  1. College of Computer Science and Technology, Jilin University, Changchun 130012
  • Received:2004-02-25 Revised:1900-01-01 Online:2004-10-26 Published:2004-10-26
  • Contact: ZHANG Chang-li

Abstract: An automatic and dictionary-free Chinese word segmentation method based on suffix array algorithm is proposed. By the algorithm based on suffix array and by using HashMap the co-occurrence patterns of Chinese characters are gotten, and Chinese words are filtered through confidence. Experiment results show that by the algorithm one can acquire high frequency lexical items effectively and efficiently without the help of the dictionary and corpus as well. This method is particularly suitable for lexical-frequency-sensitive as well as time-critical Chinese information processing application.

Key words: Chinese information processing, automatic Chinese word segmentation, suffix array, HashMap

CLC Number: 

  • TP391.12