基于改进 TFIDF-Logistic Regression 微博暴力文本分类

Journal of Jilin University (Information Science Edition) ›› 2021, Vol. 39 ›› Issue (6): 751-757.

Previous Articles Next Articles

Classification Method of Microblog Violence Text Based on Improved TFIDF-Logistic Regression

LIU Sixin ^a , GAO Jun ^b , TIAN Yilong ^b , WEI Yunli ^b , LI Xurui ^b , WU Jing ^b

a. College of Automobile; b. College of Computer Science and Technology, Jilin University, Changchun 130022, China

Received:2021-04-14 Online:2021-12-01 Published:2021-12-02

Abstract

Abstract: In order to solve the problem of automatic identification and detection of violent speech on Weibo network, after analyzing the domestic and foreign research on violent text recognition, based on microblog corpus, a data set is established, and data cleaning work is carried out. An improved TFIDF text vectorization method is proposed. The vector of traditional method and the vector constructed by this method are used for the input of the logistic regression model, and the logistic regression violent text classification models of the traditional method and the improved method are created respectively. The above models are evaluated and compared. The experimental results show that the AUC and accuracy of the improved method are 0. 969 and 0. 970, respectively, which are 14. 4% and 15. 5% higher than those of the traditional method.

Key words: internet violence, microblog text, text vectorization, text classification, machine learning

CLC Number:

LIU Sixin , GAO Jun , TIAN Yilong , WEI Yunli , LI Xurui , WU Jing . Classification Method of Microblog Violence Text Based on Improved TFIDF-Logistic Regression[J].Journal of Jilin University (Information Science Edition), 2021, 39(6): 751-757.

[1]	ZHANG Sainan , SUN Biao . Research on Network Anomaly Detection Method Basedon Machine Learning [J]. Journal of Jilin University (Information Science Edition), 2021, 39(6): 732-742.
[2]	WANG Deqiang, WU Jun, WANG Liping. CBLGA and CBLCA Hybrid Model for Long and Short Text's Classification [J]. Journal of Jilin University (Information Science Edition), 2021, 39(5): 553-561.
[3]	HU Rong, CUI Rongyi, ZHAO Yahui. Implicit Opinion Targets Identification Based on Convolutional Neural Network [J]. Journal of Jilin University (Information Science Edition), 2019, 37(6): 638-644.
[4]	JIA Longjia,SUN Tieli,YANG Fengqin,SUN Hongguang . Class Space Density Based Weighting Scheme for Automated Text Categorization [J]. Journal of Jilin University(Information Science Ed, 2017, 35(1): 92-97.
[5]	GUO Xiao-dong, JIANG Yu-ming, FEI Fei. Improved Feature Selection Method [J]. J4, 2012, 30(5): 544-.
[6]	JI Xiang,LIU Hua-xiao,WU Fen-fen,LIU Lei. Information Extraction Based on Character Extraction and HMM [J]. J4, 2009, 27(04): 396-.

Classification Method of Microblog Violence Text Based on Improved TFIDF-Logistic Regression

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 6

Metrics

Comments

Recommended 10