Journal of Jilin University(Information Science Ed ›› 2016, Vol. 34 ›› Issue (4): 543-549.

Previous Articles     Next Articles

Improved Algorithm of Web Retrieve Results Clustering Based on Suffix Tree

DONG Yaze a, LI Wanlong b, LI Hang b, ZHENG Shanhong b   

  1. a. School of Application Technology; b. School of Computer Science & Engineering,
    Changchun University of Technology, Changchun 130012, China
  • Received:2015-12-17 Online:2016-07-25 Published:2017-01-16

Abstract: How to improve the accuracy and precision of search engine in the Internet Era is the key problem needed to be solved urgently. Based on the basic model of the suffix tree clustering algorithm, an improved search results clustering algorithm based on suffix tree is proposed, in which Vector space model is combined with suffix tree clustering to improve the effect of the base class merge. Otherwise, the number of the texts corresponding to base class node, word length included in the phrase, phrase weight and whether it contains the query terms are combined as the seletion condition of clustering label. It improves the rationality and readability of the clustering labels consquently. Finally, the method is testified by using the text classification corpus data in the Sogou corpus. The experimental results show that the method can improve the accuracy of clustering results to a certain extent.

Key words: suffix tree, text clustering, Web retrieval results, vector space model

CLC Number: 

  • TP39