基于集成神经网络的短文本分类模型

吉林大学学报(理学版)

基于集成神经网络的短文本分类模型

高云龙^1,2, 左万利^1,2, 王英^1,2, 王鑫^2,3

1. 吉林大学计算机科学与技术学院, 长春 130012; 2. 吉林大学符号计算与知识工程教育部重点实验室, 长春 130012;3. 长春工程学院计算机技术与工程学院, 长春 130012

收稿日期:2017-05-25 出版日期:2018-07-26 发布日期:2018-07-31
通讯作者: 左万利 E-mail:zuowl@jlu.edu.cn

Short Text Classification Model Based on Integrated Neural Networks

GAO Yunlong^1,2, ZUO Wanli^1,2, WANG Ying^1,2, WANG Xin^2,3

1. College of Computer Science and Technology, Jilin University, Changchun 130012, China; 2. Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Changchun 130012, China;3. School of Computer Technology and Engineering, Changchun Institute of Technology, Changchun 130012, China

Received:2017-05-25 Online:2018-07-26 Published:2018-07-31
Contact: ZUO Wanli E-mail:zuowl@jlu.edu.cn

摘要/Abstract

摘要： 针对短文本具有稀疏性强和文本长度较小等特性, 为更好地处理短文本分类问题, 提出一个基于集成神经网络的短文本分类模型. 首先, 使用扩展词向量作为模型的输入, 从而使数值词向量可有效描述短文本中形态、句法及语义特征; 其次, 利用递归神经网络（RNN）对短文本语义进行建模, 捕获短文本内部结构的依赖关系；最后, 在训练模型过程中, 利用正则化项选取经验风险和模型复杂度同时最小的模型. 通过对语料库进行短文本分类实验, 验证了所提出模型有较好的分类效果, 且该分类模型可处理变长的短文本输入, 具有良好的鲁棒性.

关键词: 分类, 集成神经网络, 扩展词向量, 短文本

Abstract: Aiming at the characteristics of sparseness and too limited words in one short text, in order to better deal with the problem of short text classification, we proposed a short text classification model based on integrated neural networks. Firstly, the extended word vector was used as the input of the model, so that the numerical word vector could effectively describe the morphological, syntactic and semantic features of short text. Secondly, the recurrent neural network (RNN) was used to model the semantics of short text, capture the dependency of internal structure of short text. Finally, we used the regularization term to select the model with minimal empirical risk and model complexity simultaneously in the process of training model. By the short text classification experiments on the corpus, we verified that the proposed model has a better classification effect, and the classification model could deal with short text input with variable length, and has a good robustness.

Key words: short text, classification, extended word vector, integrated neural network

中图分类号:

TP181

高云龙, 左万利, 王英, 王鑫. 基于集成神经网络的短文本分类模型[J]. 吉林大学学报(理学版), 2018, 56(4): 933-938.

GAO Yunlong1,2, ZUO Wanli1,2, WANG Ying1,2, WANG Xin2,3. Short Text Classification Model Based on Integrated Neural Networks[J]. Journal of Jilin University Science Edition, 2018, 56(4): 933-938.

[1]	蒲晓川, 黄俊丽, 祁宁, 宋长松. 基于密度信息熵的K-means算法在客户细分中的应用[J]. 吉林大学学报(理学版), 2021, 59(5): 1245-1251.
[2]	刘桂锋, 于绍楠, 崔璐. 一种基于集成学习策略的单细胞转录组数据集成分类算法[J]. 吉林大学学报(理学版), 2021, 59(5): 1252-1255.
[3]	李小朝. 低维Hom-Jacobi-Jordan代数的分类[J]. 吉林大学学报(理学版), 2021, 59(4): 783-788.
[4]	丁通, 刘元宁, 朱晓冬, 刘帅, 张齐贤, 张阔. 面向残差网络多元特征的轻量级虹膜分类[J]. 吉林大学学报(理学版), 2021, 59(4): 877-882.
[5]	李芳, 曲豫宾, 陈翔, 李龙, 杨帆. 一种基于类不平衡学习的情感分析方法[J]. 吉林大学学报(理学版), 2021, 59(4): 929-935.
[6]	曾宏志, 史洪松. 半监督技术和主动学习相结合的网络入侵检测方法[J]. 吉林大学学报(理学版), 2021, 59(4): 936-942.
[7]	李永丽, 王浩, 金喜子. 基于随机森林优化的自组织神经网络算法[J]. 吉林大学学报(理学版), 2021, 59(2): 351-358.
[8]	唐保祥, 任韩. 图2-2nP₅和2-nK_1,1,1,3完美匹配的计数[J]. 吉林大学学报(理学版), 2020, 58(4): 859-863.
[9]	高云龙, 吴川, 朱明. 基于改进卷积神经网络的短文本分类模型[J]. 吉林大学学报(理学版), 2020, 58(4): 923-930.
[10]	唐保祥, 任韩. 2类图完美匹配数按匹配顶点分类的递推求法[J]. 吉林大学学报(理学版), 2020, 58(2): 309-313.
[11]	贾锋, 薛潺涓, 王欣. 用于肺结节检测和分类的两阶段深度学习方法[J]. 吉林大学学报(理学版), 2020, 58(2): 329-336.
[12]	杨健, 杨超宇, 李慧宗. 基于二维离散小波的生成图像鉴别方法[J]. 吉林大学学报(理学版), 2019, 57(3): 619-626.
[13]	唐保祥, 任韩. 按匹配顶点分类的完美匹配数递推求法[J]. 吉林大学学报(理学版), 2019, 57(2): 285-290.
[14]	姚艳秋, 郑雅雯, 吕妍欣. 基于LS-SO算法的情感文本分类方法[J]. 吉林大学学报(理学版), 2019, 57(2): 375-379.
[15]	刘淑琴. 基于LTCP特征的计算机生成图像鉴别算法[J]. 吉林大学学报(理学版), 2019, 57(2): 393-398.