吉林大学学报(信息科学版) ›› 2022, Vol. 40 ›› Issue (2): 188-197.

• • 上一篇    下一篇

改进孪生 BERT 的石油钻井文献相似度分析研究

张 岩1a , 王 斌1a , 杨庆川2 , 李 玮1b   

  1. 1. 东北石油大学 a. 计算机与信息技术学院; b. 石油工程学院, 黑龙江 大庆 163318; 2. 安达市庆新油田开发有限责任公司 数据管理中心, 黑龙江 安达 151413
  • 收稿日期:2021-07-14 出版日期:2022-06-11 发布日期:2022-06-11
  • 作者简介:张岩(1980— ), 男, 辽宁大连人, 东北石油大学副教授, 博士, 主要从事人工智能、 大数据研究, (Tel)86-13644598086(E-mail)zhangyuanyuan_309@ 126. com.
  • 基金资助:
    国家自然科学基金资助项目(61873058); 黑龙江省自然科学基金重点资助项目(ZD2019F001)

Similarity Analysis of Petroleum Drilling Literature Based on Improved Siamese BERT Networks

ZHANG Yan 1a , WANG Bin 1a , YANG Qingchuan 2 , LI Wei 1b   

  1. 1a. School of Computer and Information Technology; 1b. College of Petroleum Engineering, Northeast Petroleum University, Daqing 163318, China; 2. Data Management Center, Anda Qingxin Oilfield Development Company Limited, Anda 151413, China
  • Received:2021-07-14 Online:2022-06-11 Published:2022-06-11

摘要: 针对传统方法在石油钻井领域由于检索词不标准、 语义模糊导致检索结果偏差较大的问题, 提出一种 基于BERT(Bidirectional Encoder Representation from Transformers)孪生网络模型的注意力池化方法以提高文献 相似度评估的准确率。 首先使用爬虫技术采集石油钻井文献并清洗整理, 然后利用 5 类石油钻井文献数据集 评估指标对样本进行打分标注, 最后结合钻井文献数据集特征, 提出基于孪生BERT网络的注意力池化方法, 对多特征样本进行整体语义表达。 实验结果表明, 相较于常规的池化方法, 该模型能提升石油钻井文献相似度 度量的效果, 并具有一定的泛化性能。

关键词: 文献相似度;  , BERT网络; , 石油钻井文献; , 注意力池化

Abstract: In order to solve the problem that the retrieval results are biased due to the nonstandard keywords and fuzzy semantics in the petroleum drilling literature, an attention pooling method based on the Siamese BERT(Bidirectional Encoder Representation from Transformers) networks model is proposed to improve the accuracy of literature similarity evaluation. Firstly, crawler technology is used to collect and clean the petroleum drilling literature. Then, five evaluation indexes of the petroleum drilling literature data set are used to mark the samples. Finally, combined with the data characteristics of the drilling literature data set, the attention pooling method based on Siamese BERT networks is used to express the overall semantics of multi-feature samples. The experimental results show that compared with the conventional pooling method, this method can improve the effect of similarity measurement of petroleum drilling literature, and has a certain generalization performance.

Key words: literature similarity; , bidirectional encoder representation from transformers ( BERT) network; , petroleum drilling literature; , attention pooling

中图分类号: 

  • TP391