Journal of Jilin University (Information Science Edition) ›› 2021, Vol. 39 ›› Issue (5): 553-561.

Previous Articles     Next Articles

CBLGA and CBLCA Hybrid Model for Long and Short Text's Classification

WANG Deqiang, WU Jun, WANG Liping   

  1. Department of Mechanical Engineering, Tsinghua University, Beijing 100084, China
  • Received:2021-05-04 Online:2021-10-01 Published:2021-10-01

Abstract: With the development of information technology, a large amount of text classification is needed in many industries. In order to improve the accuracy and the efficiency of classification at the same time, a kind of CNN-BiLSTM/ BiGRU mixed text classification model based on the attention mechanism(CBLGA) is proposed, in which parallel CNN(Convolution Neural Networks) with different window sizes to extract a variety of text characteristics, then input the data in BiLSTM/ BiGRU parallel model. BiLSTM/ BiGRU combination model is used to extract global characteristics relate with the whole text context, finally the characteristics of two models are fused and the Attention mechanism is introduced. Secondly, another kind of Attention of CNN-BiLSTM/ CNN mixed text classification model based on the attention mechanism(CBLCA) is proposed, and its feature is divided CNN's output into two parts. One part is input to the BiLSTM network, another is integrated to the output of BiLSTM network. Successfully retaining the partial text features extracted by CNN and the global text features extracted by BiLSTM. After several experiments, the CBLGA model and CBLCA model is achieved effective improvements in accuracy and efficiency. Finally, a set of preprocessing methods for texts with different lengths is established, so the model can improve the accuracy and efficiency of text classification target in long text and short text.

Key words: CBLGA model, CBLCA model, attention mechanism, hybrid model, text classification

CLC Number: 

  • TP391