Journal of Jilin University Science Edition

Previous Articles     Next Articles

Network News Topics Discovery Based on Improved Single-Pass Algorithm

SUN Hongguang1,2, GAO Xing3, SUN Tieli1,2, YANG Fengqin1, PENG Yang1, FENG Guozhong1   

  1. 1. School of Information Science and  Technology, Northeast Normal University, Changchun 130117, China;2. Key Lab of Intelligent Information Processing of Jilin Universities, Changchun 130117, China;3. Liberation Army Daily, Beijing 100832, China
  • Received:2016-10-24 Online:2018-01-26 Published:2018-01-24
  • Contact: SUN Tieli E-mail:suntl@nenu.edu.cn

Abstract: By improved SinglePass incremental text clustering algorithm, we organized news information with granularity of topics, and achieved the discovery of network news topics. Considering the dynamic and time characteristics of news, the position information of terms in the headlines and texts and the frequency of incremental documents of terms in the feature terms weight calculation were optimized, meanwhile, time factor was added in similarity calculation and the topics centroid vectors were updated dynamically in clustering. Through the topicbased Web crawler to construct news corpus as the test data set, the experimental results show that, compared with  the traditional algorithm, the improved algorithm reduces the cost and fallout ratio by 0.34% and 1.57% respectively, which verify the validity and accuracy of the improved algorithm.

Key words: text clustering, SinglePass algorithm, topic discovery

CLC Number: 

  • TP311.5