|
Network News Topics Discovery Based on Improved Single-Pass Algorithm
SUN Hongguang, GAO Xing, SUN Tieli, YANG Fengqin, PENG Yang, FENG Guozhong
Journal of Jilin University Science Edition. 2018, 56 (1):
114-118.
By improved SinglePass incremental text clustering algorithm, we organized news information with granularity of topics, and achieved the discovery of network news topics. Considering the dynamic and time characteristics of news, the position information of terms in the headlines and texts and the frequency of incremental documents of terms in the feature terms weight calculation were optimized, meanwhile, time factor was added in similarity calculation and the topics centroid vectors were updated dynamically in clustering. Through the topicbased Web crawler to construct news corpus as the test data set, the experimental results show that, compared with the traditional algorithm, the improved algorithm reduces the cost and fallout ratio by 0.34% and 1.57% respectively, which verify the validity and accuracy of the improved algorithm.
Related Articles |
Metrics
|