Journal of Jilin University (Information Science Edition) ›› 2025, Vol. 43 ›› Issue (2): 401-411.

Previous Articles     Next Articles

Multi-Feature Fusion Named Entity Recognition and Application in Oil and Gas Exploration Field

YUAN Man1, ZHAO Xingyu1, YUAN Jingshu1, MA Zhuoran2   

  1. 1.School of Computer and Information Technology, Northeast Petroleum University, Daqing 163318, China;2. Class ASA, Daqing Royal Bridge Academy, Daqing 163458, China
  • Received:2023-12-29 Online:2025-04-08 Published:2025-04-10

Abstract: Aiming at the limitations of existing named entity recognition methods in identifying entities involving multiple elements and nested entities in oil and gas exploration texts, a novel approach is proposed. This approach integrates multiple features using a BERT-CNN-BiGRU-Attention-CRF(Bidirectional Encoder Representations from Transformers-Convolutional Neural Network-Bidirectional Gated Recurrent Unit-Attention-Conditional Random Field) architecture for named entity recognition. The model leverages BERT's semantic extraction capability to obtain character Vectors with global features for the entire sentence. Additionally, it utilizes CNN's ability to capture local features, overcoming limitations of BERT character Vectors, and obtains character-level Vectors for words. By incorporating a custom oil and gas exploration domain dictionary and employing a bidirectional maximum matching method, dictionary feature Vectors are obtained. These three types of Vectors are concatenated and used as input for the BiGRU-Attention-CRF model. Experimental results on a self-constructed small-scale oil and gas exploration dataset demonstrate an F1 score of 91.10% . Compared to other mainstream NER ( Named Entity Recognition) methods, this model exhibits superior recognition performance. Furthermore, it provides valuable assistance in constructing knowledge graphs for the oil and gas exploration domain.

Key words: named entity recognition, oil and gas exploration, knowledge graph, BERT, convolutional neural , network, dictionary feature

CLC Number: 

  • TP391. 1