吉林大学学报(理学版)

• 计算机科学 • 上一篇    下一篇

基于集成神经网络的短文本分类模型

高云龙1,2, 左万利1,2, 王英1,2, 王鑫2,3   

  1. 1. 吉林大学 计算机科学与技术学院, 长春 130012; 2. 吉林大学 符号计算与知识工程教育部重点实验室, 长春 130012;3. 长春工程学院 计算机技术与工程学院, 长春 130012
  • 收稿日期:2017-05-25 出版日期:2018-07-26 发布日期:2018-07-31
  • 通讯作者: 左万利 E-mail:zuowl@jlu.edu.cn

Short Text Classification Model Based on Integrated Neural Networks

GAO Yunlong1,2, ZUO Wanli1,2, WANG Ying1,2, WANG Xin2,3   

  1. 1. College of Computer Science and Technology, Jilin University, Changchun 130012, China; 2. Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Changchun 130012, China;3. School of Computer Technology and Engineering, Changchun Institute of Technology, Changchun 130012, China
  • Received:2017-05-25 Online:2018-07-26 Published:2018-07-31
  • Contact: ZUO Wanli E-mail:zuowl@jlu.edu.cn

摘要: 针对短文本具有稀疏性强和文本长度较小等特性, 为更好地处理短文本分类问题, 提出一个基于集成神经网络的短文本分类模型. 首先, 使用扩展词向量作为模型的输入, 从而使数值词向量可有效描述短文本中形态、 句法及语义特征; 其次, 利用递归神经网络(RNN)对短文本语义进行建模, 捕获短文本内部结构的依赖关系; 最后, 在训练模型过程中, 利用正则化项选取经验风险和模型复杂度同时最小的模型. 通过对语料库进行短文本分类实验, 验证了所提出模型有较好的分类效果, 且该分类模型可处理变长的短文本输入, 具有良好的鲁棒性.

关键词: 分类, 集成神经网络, 扩展词向量, 短文本

Abstract: Aiming at the characteristics of sparseness and too limited words in one short text, in order to better deal with the problem of short text classification, we proposed a short text classification model based on integrated neural networks. Firstly, the extended word vector was used as the input of the model, so that the numerical word vector could effectively describe the morphological, syntactic and semantic features of short text. Secondly, the recurrent neural network (RNN) was used to model the semantics of short text, capture the dependency of internal structure of short text. Finally, we used the regularization term to select the model with minimal empirical risk and model complexity simultaneously in the process of training model. By the short text classification experiments on the corpus, we verified  that the proposed model has a better classification effect, and the classification model could deal with short text input with variable length, and has a good robustness.

Key words: short text, classification, extended word vector, integrated neural network

中图分类号: 

  • TP181