吉林大学学报(信息科学版) ›› 2025, Vol. 43 ›› Issue (2): 401-411.

• • 上一篇    下一篇

多特征融合的油气勘探领域命名实体识别与应用

袁 满1, 赵兴雨1, 袁靖舒1, 马茁然2   

  1. 1. 东北石油大学 计算机与信息技术学院, 黑龙江 大庆 163318; 2. 大庆市御桥高级中学 ASA 班, 黑龙江 大庆 163458
  • 收稿日期:2023-12-29 出版日期:2025-04-08 发布日期:2025-04-10
  • 作者简介:袁满(1965— ), 男, 吉林农安人, 东北石油大学教授, 主要从事知识组织、 认知科学、 数据科学和标准化研究, (Tel)86-15765959186(E-mail)yuanman@ nepu. edu. cn。
  • 基金资助:
    海南省哲学社会科学规划课题基金资助项目(HNSK(QN)24-53)

Multi-Feature Fusion Named Entity Recognition and Application in Oil and Gas Exploration Field

YUAN Man1, ZHAO Xingyu1, YUAN Jingshu1, MA Zhuoran2   

  1. 1.School of Computer and Information Technology, Northeast Petroleum University, Daqing 163318, China;2. Class ASA, Daqing Royal Bridge Academy, Daqing 163458, China
  • Received:2023-12-29 Online:2025-04-08 Published:2025-04-10

摘要: 针对现有命名实体识别方法在识别油气勘探文本中涉及多元素组合的实体以及嵌套实体时存在一定局限性的问题, 提出了一种多特征融合的 BERT-CNN-BiGRU-Attention-CRF(Bidirectional Encoder Representations from Transformers-Convolutional Neural Network-Bidirectional Gated Recurrent Unit-Attention-Conditional Random Field)命名实体识别方法。 模型利用 BERT 的语义提取能力获取句子具有全局特征的字向量; 并利用 CNN 局部特征捕获能力消除了 BERT 字向量的局限性得到词语字符级向量; 通过自建油气勘探领域词典, 使用双向最大长度匹配方法获取了词典特征向量。 将这 3 种向量拼接作为 BiGRU-Attention-CRF 模型的输入。 实验结果表明, 在自主构建的小规模油气勘探领域数据集上, 模型的 F1值为 91. 10% , 相较于其他主流的命名实体识别方法, 该模型具有更好的识别性能, 并为油气勘探领域知识图谱构建提供了有利帮助。

关键词: 命名实体识别, 油气勘探, 知识图谱, BERT 预训练模型, 卷积神经网络, 词典特征

Abstract: Aiming at the limitations of existing named entity recognition methods in identifying entities involving multiple elements and nested entities in oil and gas exploration texts, a novel approach is proposed. This approach integrates multiple features using a BERT-CNN-BiGRU-Attention-CRF(Bidirectional Encoder Representations from Transformers-Convolutional Neural Network-Bidirectional Gated Recurrent Unit-Attention-Conditional Random Field) architecture for named entity recognition. The model leverages BERT's semantic extraction capability to obtain character Vectors with global features for the entire sentence. Additionally, it utilizes CNN's ability to capture local features, overcoming limitations of BERT character Vectors, and obtains character-level Vectors for words. By incorporating a custom oil and gas exploration domain dictionary and employing a bidirectional maximum matching method, dictionary feature Vectors are obtained. These three types of Vectors are concatenated and used as input for the BiGRU-Attention-CRF model. Experimental results on a self-constructed small-scale oil and gas exploration dataset demonstrate an F1 score of 91.10% . Compared to other mainstream NER ( Named Entity Recognition) methods, this model exhibits superior recognition performance. Furthermore, it provides valuable assistance in constructing knowledge graphs for the oil and gas exploration domain.

Key words: named entity recognition, oil and gas exploration, knowledge graph, BERT, convolutional neural , network, dictionary feature

中图分类号: 

  • TP391. 1