Journal of Jilin University Science Edition ›› 2019, Vol. 57 ›› Issue (5): 1193-1199.

Previous Articles     Next Articles

Chinese Text Clustering Algorithm Based on Semantic Cluster

QI Xiangming, SUN Xujiao   

  1. College of Software, Liaoning Technical University, Huludao 125105, Liaoning Province, China
  • Received:2018-07-11 Online:2019-09-26 Published:2019-09-20
  • Contact: QI Xiangming E-mail:qixiangming1223@163.com

Abstract: Aiming at the problem that Chinese text clustering was influenced by semantic, grammatical and contextual factors, after using traditional vector space model to quantify representation, text vectors were independent of each other and semantic relations were ignored, which affected the results of clustering analysis, we proposed a Chinese text clustering algorithm based on semantic cluster. The algorithm is based on  the principle of word cooccurrence and semantic relevance. Firstly, termfrequencyinverse document frequecy (TFIDF) method was used to obtain the weight of feature words, and the collocation vector of feature words was used to construct semantic clusters. Secondly, by using the weight of feature words and their collocation words, the feature words were spatially transformed to the semantic cluster center, and the document vector embedded in the semantic information was obtained. Finally, the document vector was used for K-means clustering analysis. The experimental results show that the vectorization method can effectively improve the approximation ability of text vector to text semantics, and improve the accuracy and recall rate of text clustering results.

Key words: vector, feature word, semantic cluster, semantic embedding, cluster analysis

CLC Number: 

  • TP391