Journal of Jilin University (Information Science Edition) ›› 2023, Vol. 41 ›› Issue (4): 608-620.

Previous Articles     Next Articles

Named Entity Recognition for High School Chemistry Exam Papers

 ZHANG Lu 1 , MA Zirui 2 , WANG Yue 3 , MA Cuiling   

  1. 1. School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China; 2. School of Information Engineering, Ningxia University, Yinchuan 750021, China; 3. College of Computer Science and Technology, Jilin University, Changchun 130012, China; 4. Shizuishan No. 3 Middle School, Shizuishan 753000, China
  • Received:2022-08-25 Online:2023-08-16 Published:2023-08-17

Abstract: Chinese chemical named entities do not have strict word formation rules to follow, and the recognition entities contain letters, numbers, special symbols and other forms, and the traditional word vector model can not effectively distinguish between nested entities and ambiguous entities in chemical terms. The named entities of high school chemistry test resources are devided into four categories: substances, properties, quantities, and experiments, constructing a vocabulary of chemistry subjects to assist manual labeling. Then, the ALBERT pre- training model is used to extract text features and generate dynamic word vectors, and the named entity recognition is performed on the text of high school chemistry questions combined with the BILSTM-CRF (Bidirectional Long Short-Term Memory with Conditional Random Field) model. The accuracy, recall and F1 values of the proposed model reached 95. 24% ,95. 26% and 95. 25% , respectively. 

Key words: named entity recognition, a lite bert(ALBERT) pre-training model, bidirectional long short-term memory network, crf, chemical resources text

CLC Number: 

  • TP391. 1