PageRank,literature similarity; bidirectional encoder representation from transformers (BERT),paper ranking ,"/> 基于改进<span> PageRank </span>算法的文献相关度排序方法

吉林大学学报(信息科学版) ›› 2022, Vol. 40 ›› Issue (3): 464-470.

• • 上一篇    下一篇

基于改进 PageRank 算法的文献相关度排序方法

聂永丹, 王 斌, 张 岩   

  1. 东北石油大学 计算机与信息技术学院, 黑龙江 大庆 163318
  • 收稿日期:2021-12-21 出版日期:2022-07-14 发布日期:2022-07-15
  • 作者简介:聂永丹( 1980— ), 女, 吉林梅河口人, 东北石油大学副教授, 主要从事人工智能、 大数据分析研究, ( Tel) 86- 18345978528(E-mail)nieydzy@ 163. com。
  • 基金资助:
    中国博士后科学基金资助项目(2019M651254); 东北石油大学青年科学基金资助项目(2018QNL-49)

Literature Relevance Ranking Method Based on Improved PageRank Algorithm

NIE Yongdan, WANG Bin, ZHANG Yan   

  1. School of Computer and Information Technology, Northeast Petroleum University, Daqing 163318, China
  • Received:2021-12-21 Online:2022-07-14 Published:2022-07-15

摘要: 科技文献检索时以专业角度给出合理的相关度排序是一项非常重要工作, 传统 PageRank 算法采用了平均分配相似性权重的方式, 但其会产生文献排序结果不合理的问题。 为此, 提出一种将深度学习方法与 PageRank相结合的算法, 提高文献相关度排序的可靠性。首先, 使用具有注意力池化的孪生BERT(Bidirectional Encoder Representation from Transformers)深度学习网络计算文献与引文的相似度; 然后, 对文献与其所包含引文间的相似度进行规范化处理; 最后, 将标准化后的相似度作为分配权重对引文网络计算排序。 实验结果表明, 相较于传统的PageRank算法, 该方法检索结果的相关度提升 6% 以上, 因此更适合应用于科技文献的引文网络分析。

关键词: PageRank 算法, 文献相似度, BERT 模型, 文献排序 

Abstract: In the work of scientific and technological literature retrieval, it is very important to give a reasonable correlation ranking from a professional point of view. The traditional PageRank algorithm uses the method of evenly distributing similarity weights, but this method will cause the unreasonable results of literature ranking. Therefore, an algorithm combining deep learning method and PageRank is proposed to improve the reliability of literature relevance ranking. Firstly, the Siamese BERT ( Bidirectional Encoder Representation from Transformers) network with attention pooling is used to calculate the similarity between literature and citations, and then the similarity between literature and citations contained in literature is normalized. Finally, the normalized similarity is used as the distribution weight to calculate the ranking results of citation network. The experimental results show that compared with the traditional PageRank algorithm, the correlation of the retrieval results of this method is improved by more than 6% , which is more suitable for citation network analysis of scientific and technological literature.

Key words: PageRank')">

PageRank, literature similarity; bidirectional encoder representation from transformers (BERT), paper ranking

中图分类号: 

  • TP391. 1