吉林大学学报(理学版) ›› 2021, Vol. 59 ›› Issue (4): 922-928.

• • 上一篇    下一篇

基于深度学习的Stack Overflow问题帖分类方法

杨光1, 贾焱鑫1, 陈翔1,2, 许舒源1   

  1. 1. 南通大学 信息科学技术学院, 江苏 南通 226019; 2. 南京大学 计算机软件新技术国家重点实验室,  南京 210023
  • 收稿日期:2020-06-10 出版日期:2021-07-26 发布日期:2021-07-26
  • 通讯作者: 陈翔 E-mail:xchencs@ntu.edu.cn

Stack Overflow Question Post Classification Method Based on Deep Learning

YANG Guang1, JIA Yanxin1, CHEN Xiang1,2, XU Shuyuan1   

  1. 1. School of Information Science and Technology, Nantong University, Nantong 226019, Jiangsu Province, China;
    2. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
  • Received:2020-06-10 Online:2021-07-26 Published:2021-07-26

摘要: 针对基于正则表达式和传统机器学习的分类方法分别存在模式手工提取困难和性能瓶颈的问题, 提出一种基于深度学习的问题帖分类方法, 采用深度文本挖掘模型TextCNN和融合注意力机制的TextRNN构建分类模型. 实验结果表明, 基于深度学习的方法在多数问题目的类别上的分类性能优于已有基准方法, 且使用的Adam优化器优于SGD优化器, 使用Glove预训练的词向量优于使用随机生成的词向量. 该方法以提问目的对帖子进行分类, 可为分析Stack Overflow(SO)上的帖子讨论主题增加新维度.

关键词: 帖子问题目的, 深度学习, 文本挖掘, 词向量

Abstract: The classification methods based on regular expressions and traditional machine learning had the problems of manual extraction of patterns and performance bottleneck, we proposed deep learning-based classification methods for question post, the deep text mining model TextCNN and integrating attention mechanism—TextRNN were used to construct a classification model. The experimental results show that the classification performance of deep learning-based methods is better than the existing benchmark methods on most of the question purpose categories, and the Adam optimizer is better than the SGD optimizer, and the Glove pre-trained word vector is better than randomly generated word vectors. The method classifies posts for the purpose of asking question, which can add a new dimension to the analysis of post discussion topics on Stack Overflow (SO).

Key words: post question purpose, deep learning, text mining, word vector

中图分类号: 

  • TP311.5