吉林大学学报(理学版) ›› 2022, Vol. 60 ›› Issue (6): 1399-1406.

• • 上一篇    下一篇

融入新闻标题信息的新闻文本与评论的语义相似度计算方法

李伊仝1, 王红斌1, 程良2   

  1. 1. 昆明理工大学 信息工程与自动化学院, 昆明 650504; 2. 昆明理工大学 城市学院, 昆明 650051
  • 收稿日期:2021-08-27 出版日期:2022-11-26 发布日期:2022-11-26
  • 通讯作者: 王红斌 E-mail:whbin2007@126.com

Semantic Similarity Calculation Method of News Text and Comment Integrated with News Title Information

LI Yitong1, WANG Hongbin1, CHENG Liang2   

  1. 1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504, China; 2. College of City, Kunming University of Science and Technology, Kunming 650051, China
  • Received:2021-08-27 Online:2022-11-26 Published:2022-11-26

摘要: 针对预训练模型在处理新闻这种长文本时会截断一部分文本, 导致文本信息缺失的问题, 提出一种在融入新闻标题信息基础上将TextRank算法、隐含Dirichlet分布主题模型与预训练模型相结合的方法构建模型, 并将该模型与其他语义相似度计算方法进行对比. 结果表明, 该模型准确率为82.46%, 召回率为87.43%, 精确率为82.68%, F1值为84.99%, 取得了最优结果, 从而有效提高了新闻文本与评论的语义相似度计算性能.

关键词: 语义相似度, 预训练模型, 隐含Dirichlet分布,  , 新闻评论

Abstract: Aiming at the problem that the pre-training model would cut off part of text when dealing with long text such as news, which led to the loss of text infomation, we  proposed a  method to build a model by combining  TextRank algorithm, implicit Dirichlet distribution topic model and pre-training model on the basis of integrating news title information, and  compared the model with other  semantic similarity calculation methods. The results show that the accuracy rate of the model is 82.46%, the recall rate is 87.43%, the accuracy rate is 82.68%, and the F1 value is 84.99%,  the optimal results are obtained, which effectively improves the performance of semantic similarity calculation between news texts and comments.

Key words: semantic similarity, pre-training model, implicit Dirichlet distribution,  , news comment

中图分类号: 

  • TP391.1