Journal of Jilin University Science Edition ›› 2021, Vol. 59 ›› Issue (4): 929-935.

Previous Articles     Next Articles

A Sentiment Analysis Method Based on Class Imbalance Learning

LI Fang1,2, QU Yubin1,3, CHEN Xiang4, LI Long1, YANG Fan5   

  1. 1. Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, Guangxi Zhuang Autonomous Region, China; 2. School of Civil Engineering, Jiangsu College of Engineering and Technology, Nantong 226001, Jiangsu Province, China; 3. School of Information Engineering, Jiangsu College of Engineering and Technology, Nantong 226001, Jiangsu Province, China; 4. School of Information Science and Technology, Nantong University, Nantong 226019, Jiangsu Province, China; 5. Center of Library and Information, Jiangsu College of Engineering and Technology, Nantong 226001, Jiangsu Province, China
  • Received:2020-08-31 Online:2021-07-26 Published:2021-07-26

Abstract: Aiming at the  problem that class imbalance generally existed less negative comments but more influence in the network comments, we proposed a sentiment analysis method based on class imbalance learning. This method used the probability output in the process of deep learning training to calculate the information entropy of the sample. The information entropy was used as the influence factor to construct the cross information entropy loss function. The experimental results on the IMDB public dataset show that the bidirectional long short-term memory network based on the integrated information entropy loss function can deal with class imbalance problem. The statistical analysis of the data shows that this strategy can improve performance of sentiment polarity classification based on the bidirectional long short-term memory network. For the AUC (area under curve) indicator, the median of bidirectional long short-term memory network model with the integrated information entropy loss function is 15.3% higer than that of the deep learning model that does not consider class imbalance.

Key words: text classification, long short-term memory network, class imbalance, cross-entropy loss function

CLC Number: 

  • TP311