Journal of Jilin University(Engineering and Technology Edition) ›› 2022, Vol. 52 ›› Issue (8): 1889-1895.doi: 10.13229/j.cnki.jdxbgxb20210167

Previous Articles    

Unbalanced text classification method based on deep learning

Xiao-ying LI1(),Ming YANG1,Rui QUAN2,Bao-hua TAN3()   

  1. 1.Industrial Design Engineering,Hubei University of Technology,Wuhan 430068,China
    2.Hubei Key Laboratory for High-Efficiency Utilization of Solar Energy and Operation Control of Energy Storage System,Hubei University of Technology,Wuhan 430068,China
    3.School of Science,Hubei University of Technology,Wuhan 430068,China
  • Received:2021-03-05 Online:2022-08-01 Published:2022-08-12
  • Contact: Bao-hua TAN E-mail:yangyang1994121@163.com;tan_bh@126.com

Abstract:

In unbalanced text classification, the classification results tend to the majority and ignore the minority, which leads to poor classification effect. The unbalanced text classification method based on deep learning is studied. DA method is used to select unbalanced text features. DA method sets the scoring standard to the minimum value of the difference of document probability correlation, so that the selected text features are evenly distributed in most classes and a few classes to improve the balance of text features. The subset obtained by feature selection is used as the input of the depth belief network composed of multiple constrained Boltzmann machines. The constrained Boltzmann machine obtains the optimal probability distribution of training samples through pre training. The weight of the constrained Boltzmann machine is determined by contrast bifurcation algorithm. After the parameters of the constrained Boltzmann machine are set, the greedy algorithm is used to train the constrained Boltzmann machine iteratively until the whole process is completed text classification. Experimental results show that this method can effectively classify unbalanced text, and the classification accuracy is more than 99.5%.

Key words: deep learning, imbalance, text, classification method, deep belief network, document probability, pre-training, contrast divergence algorithm

CLC Number: 

  • TP391

Fig.1

Comparison of three methods"

Fig.2

Comparison of classification results of three methods"

Fig.3

Comparison of classification accuracy"

Fig.4

Comparison of F1 values"

1 陈志, 郭武. 不平衡训练数据下的基于深度学习的文本分类[J]. 小型微型计算机系统, 2020, 41(1): 1-5.
Chen Zhi, Guo Wu. Text classification based on deep learning under unbalanced training data[J]. Mini Computer System, 2020, 41(1): 1-5
2 汤景泰, 陈秋怡. 意见领袖的跨圈层传播与“回音室效应”——基于深度学习文本分类及社会网络分析的方法[J]. 现代传播(中国传媒大学学报), 2020, 286(5): 31-39.
Tang Jing-tai, Chen Qiu-yi. Cross circle communication of opinion leaders and "echo room effect"— a method based on deep learning text classification and social network analysis[J]. Modern Communication (Journal of Communication University of China), 2020, 286(5): 31-39.
3 汪少敏, 杨迪, 任华. 基于深度学习的文本分类系统关键技术研究与模型验证[J]. 电信科学, 2018, 34(12): 123-130.
Wang Shao-min, Yang Di, Ren Hua. Key technology research and model validation of text classification system based on deep learning[J]. Telecom Science, 2018, 34(12): 123-130.
4 崔昕阳, 龙华, 熊新, 等. 基于并行双向门控循环单元与自注意力机制的中文文本情感分类[J]. 北京化工大学学报: 自然科学版, 2020, 47(2): 115-123.
Cui Xin-yang, Long Hua, Xiong Xin, et al. Sentiment classification of Chinese texts based on parallel bidirectional gating cycle unit and self attention mechanism[J]. Journal of Beijing University of Chemical Technology(Natural Science Edition), 2020, 47(2): 115-123.
5 李杰, 李欢. 基于深度学习的短文本评论产品特征提取及情感分类研究[J]. 情报理论与实践, 2018, 41(2): 143-148.
Li Jie, Li Huan. Feature extraction and sentiment classification of short text reviews based on deep learning[J]. Intelligence Theory and Pactice, 2018, 41(2): 143-148.
6 吴皋, 李明, 周稻祥, 等. 基于深度集成朴素贝叶斯模型的文本分类[J]. 济南大学学报: 自然科学版, 2020, 149(5): 17-23.
Wu Gao, Li Ming, Zhou Dao-xiang, et al. Text classification based on deep integration naive Bayesian model[J]. Journal of Jinan University(Natural Science Edition), 2020, 149(5): 17-23.
7 吴玉佳, 李晶, 宋成芳, 等. 基于高效用神经网络的文本分类方法[J]. 电子学报, 2020, 48(2): 279-284.
Wu Yu-jia, Li Jing, Song Cheng-fang, et al. Text classification method based on efficient neural network[J]. Acta electronica Sinica, 2020, 48(2): 279-284.
8 马喆康, 迪力亚尔·帕尔哈提, 早克热·卡德尔, 等. 一种集成深度学习模型的旅游问句文本分类算法[J]. 计算机工程, 2020, 520(11): 76-82.
Ma Zhe-kang, Parharti Diliar, Kader Zaokere, et al. A text classification algorithm for tourism questions based on integrated deep learning model[J]. Computer Engineering, 2020, 520(11): 76-82.
9 孟先艳, 崔荣一, 赵亚慧,等. 基于双向长短时记忆单元和卷积神经网络的多语种文本分类方法[J]. 计算机应用研究, 2020, 347(9): 115-119.
Meng Xian-yan, Cui Rong-yi, Zhao Ya-hui, et al. Multilingual text classification method based on bidirectional long short time memory unit and convolutional neural network[J]. Computer Application Research, 2020, 347(9): 115-119.
10 郑炜, 陈军正, 吴潇雪, 等. 基于深度学习的安全缺陷报告预测方法实证研究[J]. 软件学报, 2020, 31(5): 58-77.
Zheng Wei, Chen Jun-zheng, Wu Xiao-xue, et al. An empirical study on security defect report prediction method based on deep learning[J]. Acta Sinica Sinica Sinica, 2020, 31(5): 58-77.
11 王丽亚, 刘昌辉, 蔡敦波,等. CNN-BiGRU网络中引入注意力机制的中文文本情感分析[J]. 计算机应用, 2019, 39(10): 2841-2846.
Wang Li-ya, Liu Chang-hui, Cai Dun-bo, et al. Chinese text sentiment analysis with attention mechanism in CNN bigru network[J]. Computer Applications, 2019, 39(10): 2841-2846.
12 谢红玲, 奉国和, 何伟林. 基于深度学习的科技文献语义分类研究[J]. 情报理论与实践, 2018, 41(11):149-154.
Xie Hong-ling, Feng Guo-he, He Wei-lin. Semantic classification of scientific and technological literature based on deep learning[J]. Information Theory and Practice, 2018, 41(11): 149-154.
13 宋化志, 马于涛. DeepTriage:一种基于深度学习的软件缺陷自动分配方法[J]. 小型微型计算机系统, 2019, 40(1): 128-134.
Song Hua-zhi, Ma Yu-tao. Deeptriage: an automatic software defect allocation method based on deep learning[J]. Mini Computer System, 2019, 40(1): 128-134.
14 董丽丽, 杨丹, 张翔. 基于深度学习的大规模语义文本重叠区域检索[J]. 吉林大学学报: 工学版, 2021, 51(5): 1817-1822.
Dong Li-li, Yang Dan, Zhang Xiang. Large-scale semantic text overlapping region retrieval based on deep learning[J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(5): 1817-1822.
15 翟玲,崔旭.基于分段估计和PageRank的文本信息相似性搜索算法[J].吉林大学学报:工学版,2022,52(4):910-915.
Zhai Ling, Cui Xu. Text information similarity search algorithm based on segment estimation and PageRank [J] Journal of Jilin University(Engineering and Technology Edition), 2022, 52 (4): 910-915.
[1] Tian BAI,Ming-wei XU,Si-ming LIU,Ji-an ZHANG,Zhe WANG. Dispute focus identification of pleading text based on deep neural network [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(8): 1872-1880.
[2] Xuan-jing SHEN,Xue-feng ZHANG,Yu WANG,Yu-bo JIN. Multi⁃focus image fusion algorithm based on pixel⁃level convolutional neural network [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(8): 1857-1864.
[3] Gui-he QIN,Jun-feng HUANG,Ming-hui SUN. Text input based on two⁃handed keyboard in virtual environment [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(8): 1881-1888.
[4] Ming-hua GAO,Can YANG. Traffic target detection method based on improved convolution neural network [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(6): 1353-1361.
[5] Ling ZHAI,Xu CUI. Text information similarity search algorithm based on segment estimation and PageRank [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(4): 910-915.
[6] Yong LIU,Lei XU,Chu-han ZHANG. Deep reinforcement learning model for text games [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(3): 666-674.
[7] Ji-hong OUYANG,Ze-qi GUO,Si-guang LIU. Dual⁃branch hybrid attention decision net for diabetic retinopathy classification [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(3): 648-656.
[8] Xue WANG,Zhan-shan LI,Ying-da LYU. Medical image segmentation based on multi⁃scale context⁃aware and semantic adaptor [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(3): 640-647.
[9] Lin SONG,Li-ping WANG,Jun WU,Li-wen GUAN,Zhi-gui LIU. Reliability analysis based on cyber⁃physical system and digital twin [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(2): 439-449.
[10] Jie CAO,Jia-lin MA,Dai-lin HUANG,Ping YU. A fault diagnosis method based on multi Markov transition field [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(2): 491-496.
[11] Gui-xia LIU,Zhi-yao PEI,Jia-zhi SONG. Prediction of protein-ATP binding site based on deep learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(1): 187-194.
[12] You QU,Wen-hui LI. Single-stage rotated object detection network based on anchor transformation [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(1): 162-173.
[13] Jie ZHANG,Wen JING,Fu CHEN. Vulnerability detection of instant messaging network protocol based on passive clustering algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(6): 2253-2258.
[14] Dong-ming SUN,Liang HU,Yong-heng XING,Feng WANG. Text fusion based internet of things service recommendation for trigger⁃action programming pattern [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(6): 2182-2189.
[15] Li-li DONG,Dan YANG,Xiang ZHANG. Large⁃scale semantic text overlapping region retrieval based on deep learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(5): 1817-1822.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!