吉林大学学报(理学版) ›› 2021, Vol. 59 ›› Issue (4): 929-935.

• • 上一篇    下一篇

一种基于类不平衡学习的情感分析方法

李芳1,2, 曲豫宾1,3, 陈翔4, 李龙1, 杨帆5   

  1. 1. 桂林电子科技大学 广西可信软件重点实验室, 广西 桂林 541004;  2. 江苏工程职业技术学院 建筑工程学院, 江苏 南通 226001; 3. 江苏工程职业技术学院 信息工程学院, 江苏  南通 226001; 4. 南通大学 信息科学技术学院, 江苏 南通 226019; 5. 江苏工程职业技术学院 图文信息中心, 江苏 南通 226001
  • 收稿日期:2020-08-31 出版日期:2021-07-26 发布日期:2021-07-26
  • 通讯作者: 曲豫宾 E-mail:quyubin@hotmail.com

A Sentiment Analysis Method Based on Class Imbalance Learning

LI Fang1,2, QU Yubin1,3, CHEN Xiang4, LI Long1, YANG Fan5   

  1. 1. Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, Guangxi Zhuang Autonomous Region, China; 2. School of Civil Engineering, Jiangsu College of Engineering and Technology, Nantong 226001, Jiangsu Province, China; 3. School of Information Engineering, Jiangsu College of Engineering and Technology, Nantong 226001, Jiangsu Province, China; 4. School of Information Science and Technology, Nantong University, Nantong 226019, Jiangsu Province, China; 5. Center of Library and Information, Jiangsu College of Engineering and Technology, Nantong 226001, Jiangsu Province, China
  • Received:2020-08-31 Online:2021-07-26 Published:2021-07-26

摘要: 针对网络评论中普遍存在的负面评论较少而影响力却较大的类不平衡问题, 提出一种基于类不平衡学习的情感分析方法. 该方法利用深度学习训练过程中的概率输出, 以计算样例的信息熵作为影响因子构建交叉信息熵损失函数. 在IMDB公开数据集上进行实验验证的结果表明, 基于集成信息熵损失函数的双向长短期记忆网络能处理类不平衡问题; 对数据的统计分析结果表明, 该策略能提升基于双向长短期记忆网络的评论情感极性分类性能. 针对AUC(area under curve)指标, 使用集成信息熵损失函数的双向长短期记忆网络模型比未考虑类不平衡的深度学习模型在中位数上最多提升15.3%.

关键词: 文本分类, 长短期记忆网络, 类不平衡, 交叉熵损失函数

Abstract: Aiming at the  problem that class imbalance generally existed less negative comments but more influence in the network comments, we proposed a sentiment analysis method based on class imbalance learning. This method used the probability output in the process of deep learning training to calculate the information entropy of the sample. The information entropy was used as the influence factor to construct the cross information entropy loss function. The experimental results on the IMDB public dataset show that the bidirectional long short-term memory network based on the integrated information entropy loss function can deal with class imbalance problem. The statistical analysis of the data shows that this strategy can improve performance of sentiment polarity classification based on the bidirectional long short-term memory network. For the AUC (area under curve) indicator, the median of bidirectional long short-term memory network model with the integrated information entropy loss function is 15.3% higer than that of the deep learning model that does not consider class imbalance.

Key words: text classification, long short-term memory network, class imbalance, cross-entropy loss function

中图分类号: 

  • TP311