Journal of Jilin University (Information Science Edition) ›› 2021, Vol. 39 ›› Issue (6): 751-757.
Previous Articles Next Articles
LIU Sixin a , GAO Jun b , TIAN Yilong b , WEI Yunli b , LI Xurui b , WU Jing b
Received:
Online:
Published:
Abstract: In order to solve the problem of automatic identification and detection of violent speech on Weibo network, after analyzing the domestic and foreign research on violent text recognition, based on microblog corpus, a data set is established, and data cleaning work is carried out. An improved TFIDF text vectorization method is proposed. The vector of traditional method and the vector constructed by this method are used for the input of the logistic regression model, and the logistic regression violent text classification models of the traditional method and the improved method are created respectively. The above models are evaluated and compared. The experimental results show that the AUC and accuracy of the improved method are 0. 969 and 0. 970, respectively, which are 14. 4% and 15. 5% higher than those of the traditional method.
Key words: internet violence, microblog text, text vectorization, text classification, machine learning
CLC Number:
LIU Sixin , GAO Jun , TIAN Yilong , WEI Yunli , LI Xurui , WU Jing . Classification Method of Microblog Violence Text Based on Improved TFIDF-Logistic Regression[J].Journal of Jilin University (Information Science Edition), 2021, 39(6): 751-757.
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: http://xuebao.jlu.edu.cn/xxb/EN/
http://xuebao.jlu.edu.cn/xxb/EN/Y2021/V39/I6/751
Cited