吉林大学学报(理学版) ›› 2020, Vol. 58 ›› Issue (4): 923-930.

• 计算机科学 • 上一篇    下一篇

基于改进卷积神经网络的短文本分类模型

高云龙1,2, 吴川1, 朱明1   

  1. 1. 中国科学院 长春光学精密机械与物理研究所, 长春 130033;
    2. 中国科学院 航空光学成像与测量重点实验室, 长春 130033
  • 收稿日期:2019-11-13 出版日期:2020-07-26 发布日期:2020-07-16
  • 通讯作者: 吴川 E-mail:wuchuan0458@163.com

Short Text Classification Model Based onImproved Convolutional Neural Network

GAO Yunlong1,2, WU Chuan1, ZHU Ming1   

  1. 1. Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Science, Changchun 130033, China;
    2. Key Laboratory of Airborne Optical Imaging and Measurement, Chinese Academy of Sciences, Changchun 130033, China
  • Received:2019-11-13 Online:2020-07-26 Published:2020-07-16
  • Contact: WU Chuan E-mail:wuchuan0458@163.com

摘要: 基于卷积神经网络, 提出一种基于改进卷积神经网络的短文本分类模型. 首先, 采用不同编码方式将短文本映射到不同空间下的分布式表示, 提取不同粒度的数字特征作为短文本分类模型的多通道输入, 并根据标准知识库提取概念特征作为先验知识, 提高短文本的语义表征能力; 其次, 在全连接层增加自编码学习策略, 在近似恒等的基础上进一步组合数字特征, 模拟数据内部的关联性; 最后, 利用相对熵原理为模型增加稀疏性限制, 降低模型复杂度的同时提高模型的泛化能力. 通过对开源数据集进行短文本分类实验, 验证了模型的有效性.

关键词: 卷积神经网络, 短文本, 概念分布式表示, 稀疏, 自编码

Abstract: We proposed a short text classification model based on improved convolutional neural network. Firstly, different coding methods were used to map short text to distributed representation in different spaces, and digital features of different granularities were extracted as multi-channel inputs of short text classification model. Extracting concept features from standard knowledge base as prior knowledge to improve the semantic representation ability of short text. Secondly, the selfcoding learning strategy was added to the full connection layer, on the basis of approximate identity, the digital features were further combined to simulate the relevance within the data. Finally, the principle of relative entropy were used to increase the sparsity limit of the model, reduce the complexity and improve the generalization ability of the model. The effectiveness of the proposed model was verified by short text classification experiments on the open source dataset.

Key words: convolutional neural network, short text, concept distributed representation, sparsity, selfcoding

中图分类号: 

  • TP181