J4

• 计算机科学 • Previous Articles     Next Articles

Implementation of Chinese and English Clustering EngineBased on Improved Suffix Tree Algorithm

HU Hailong1, SUN Chen2, HE Fengling1, ZUO Wanli1   

  1. 1. College of Computer Science and Technology, Jilin University, Changchun 130012, China;2. College of Communication Engineering, Jilin University, Changchun 130012, China
  • Received:2008-06-13 Revised:1900-01-01 Online:2009-03-26 Published:2009-03-26
  • Contact: HE Fengling

Abstract: This paper presents an algorithm based on the improved suffix tree and interactclustering idea. Hierarchical clustering for document title and summary is implemented by improved traditional suffix tree structure. Meanwhile, the interactive clustering is employed instead of traditional recursive algorithm. The algorithm is not related with language. Not only is it applicable to wordbased English, but also it can deal effectively with characterbased Chinese without dictionarybased Chinese word segmentation. Furthermore, the interactive clustering engine was realized on the basis of the algorithm, the system was tested in different network environments, and the performance of the system was compared with other metasearch engines. The experimemnt demonstrates that it is feasible effectively to conduct realtime interactive cluste ring by using the improved suffix tree algorithm.

Key words: suffix tree, text clustering, meta search engine

CLC Number: 

  • TP31