J4 ›› 2009, Vol. 47 ›› Issue (4): 790-794.

Previous Articles     Next Articles

Research and Implementation of Related Algorithm ofChinese Text Categorization

 XU Pei-Juan, LI Xiong-Fei, HUI Yue, ZHANG Gui-Lin   

  1. College of Computer Science and Technology, Jilin University, Changchun 130012, China
  • Received:2009-01-14 Online:2009-07-26 Published:2009-08-24
  • Contact: LI Xiong-Fei E-mail:lxf@jlu.edu.cn

Abstract:

On the basis of the analysis of the process of dealing with the Chinese word segmentation ambiguity, this paper covers bidirectional sc
an word segmentation algorithm based on the context. In order to improve the word segmentation dictionary, the authors put the fixed phrase into the dictionary and discussed the feature selection and the weighting schema enactment in detail. In order to solve the problem of general TFIDF weighting schema at present, we took statistics into consideration, and meanwhile put up the itemscoring method which improves the efficiency of the feature item about text categorization. At last we proved the advantage of the improved weighting schema through test.

Key words: text categorization; context bidirectional scan; vector space model; weighting schema; feature selection

CLC Number: 

  • TP391.1