Journal of Jilin University (Information Science Edition) ›› 2021, Vol. 39 ›› Issue (6): 751-757.

Previous Articles     Next Articles

Classification Method of Microblog Violence Text Based on Improved TFIDF-Logistic Regression

LIU Sixin a , GAO Jun b , TIAN Yilong b , WEI Yunli b , LI Xurui b , WU Jing b   

  1. a. College of Automobile; b. College of Computer Science and Technology, Jilin University, Changchun 130022, China
  • Received:2021-04-14 Online:2021-12-01 Published:2021-12-02

Abstract: In order to solve the problem of automatic identification and detection of violent speech on Weibo network, after analyzing the domestic and foreign research on violent text recognition, based on microblog corpus, a data set is established, and data cleaning work is carried out. An improved TFIDF text vectorization method is proposed. The vector of traditional method and the vector constructed by this method are used for the input of the logistic regression model, and the logistic regression violent text classification models of the traditional method and the improved method are created respectively. The above models are evaluated and compared. The experimental results show that the AUC and accuracy of the improved method are 0. 969 and 0. 970, respectively, which are 14. 4% and 15. 5% higher than those of the traditional method.

Key words: internet violence, microblog text, text vectorization, text classification, machine learning

CLC Number: 

  • TP3