改进的基于后缀树的Web 搜索结果聚类算法

Journal of Jilin University(Information Science Ed ›› 2016, Vol. 34 ›› Issue (4): 543-549.

Previous Articles Next Articles

Improved Algorithm of Web Retrieve Results Clustering Based on Suffix Tree

DONG Yaze a, LI Wanlong b, LI Hang b, ZHENG Shanhong b

a. School of Application Technology; b. School of Computer Science & Engineering,
Changchun University of Technology, Changchun 130012, China

Received:2015-12-17 Online:2016-07-25 Published:2017-01-16

Abstract

Abstract: How to improve the accuracy and precision of search engine in the Internet Era is the key problem needed to be solved urgently. Based on the basic model of the suffix tree clustering algorithm, an improved search results clustering algorithm based on suffix tree is proposed, in which Vector space model is combined with suffix tree clustering to improve the effect of the base class merge. Otherwise, the number of the texts corresponding to base class node, word length included in the phrase, phrase weight and whether it contains the query terms are combined as the seletion condition of clustering label. It improves the rationality and readability of the clustering labels consquently. Finally, the method is testified by using the text classification corpus data in the Sogou corpus. The experimental results show that the method can improve the accuracy of clustering results to a certain extent.

Key words: suffix tree, text clustering, Web retrieval results, vector space model

CLC Number:

TP39

DONG Yaze, LI Wanlong, LI Hang, ZHENG Shanhong. Improved Algorithm of Web Retrieve Results Clustering Based on Suffix Tree[J].Journal of Jilin University(Information Science Ed, 2016, 34(4): 543-549.

Improved Algorithm of Web Retrieve Results Clustering Based on Suffix Tree

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 1

Metrics

Comments

Recommended 0