Journal of Jilin University Science Edition ›› 2023, Vol. 61 ›› Issue (4): 909-914.

Previous Articles     Next Articles

Short Text Semantic Similarity Measurement Algorithm  Based on Hybrid Machine Learning Model

HAN Kaixu1, YUAN Shufang2   

  1. 1. College of Electronics and Information Engineering, Beibu Gulf University, Qinzhou 535011, Guangxi Zhuang Autonomous Region, China; 2. College of Sciences, Beibu Gulf University, Qinzhou 535011, Guangxi Zhuang Autonomous Region, China
  • Received:2022-04-15 Online:2023-07-26 Published:2023-07-26

Abstract: In order to improve the accuracy of short text semantic similarity measurement, we designed a short text semantic similarity measurement algorithm based on a hybrid machine learning model. Firstly, we preprocessed the short text, constructed a word vector model of the short text based on the hybrid machine learning model, and extended the  features of the short text. Secondly, we  combined the various metric features of the short text, implemented dimensional reduction on the various metric features. Finally, we constructed an ensemble learning  model to calculate the semantic similarity results and achieve the  semantic similarity measurement. We tested the performance of the method by using the “Quora Question Pairs” competition dataset, the test results show that the accuracy of the  method is high, the logarithmic loss, and the measurement mean square error are both low, indicating that the similarity measurement accuracy of the method is high.

Key words: hybrid machine learning model, short text, text segmentation, semantic similarity, Chi-square test, similarity measurement

CLC Number: 

  • TP391