Journal of Jilin University(Engineering and Technology Edition) ›› 2026, Vol. 56 ›› Issue (1): 231-238.doi: 10.13229/j.cnki.jdxbgxb.20241174

Previous Articles     Next Articles

Chinese named entity recognition algorithm with soft attention mask embedding

Xiu-hui WANG1(),Yong-bo XU2   

  1. 1.School of Computer and Network Engineering,Shanxi Datong University,Datong 037009,China
    2.College of Artificial Intelligence,Henan University,Zhengzhou 450046,China
  • Received:2024-10-10 Online:2026-01-01 Published:2026-02-03

Abstract:

The semantics of Chinese vocabulary have a certain degree of ambiguity. In Chinese text, there are some features that have low relevance to named entity recognition. The same vocabulary has different meanings in different contexts, and different vocabulary and phrases have different contributions to named entity recognition. If weighting or masking operations are not performed, these features will interfere with the recognition accuracy of the model. To this end, a Chinese named entity recognition (CNER) algorithm with soft attention mask embedding is studied. Establish a multi-level CNER model, in the word vector representation layer of the model, use jieba technology to perform segmentation processing on the Chinese text passed from the input layer, and use Word2Vec method to obtain the word vectors of each vocabulary, forming a sequence of word vectors. In the BiLSTM layer, bidirectional long short-term memory processing is applied to the sequence of word vectors to obtain feature vectors that fuse contextual information for each word vector. Embedding a soft attention mask module after the BiLSTM layer, using the soft attention mechanism of this module to perform weighted and masked operations on the feature vectors output by the BiLSTM layer, focusing on features that contribute significantly to entity recognition, removing and suppressing unimportant features, and improving recognition accuracy. Label and decode the feature vectors processed by the soft attention mask module in the CRF layer to obtain the optimal entity label sequence, which is the Chinese named entity recognition result. The experiment shows that the algorithm can accurately recognize Chinese named entities, and has good performance in entity label annotation coverage and F1 value.

Key words: Chinese naming, soft attention, entity recognition, mask operation, Word2Vec, BiLSTM model

CLC Number: 

  • TP391.1

Fig.1

CNER model design"

Fig.2

Structure diagram of CBOW model"

Table 1

Main parameters of experiment"

名 称数 值
词向量维度200
CBOW窗口大小7
LSTM单元隐含层60
隐藏层神经元数量256
学习率0.1
Dropout0.5
数据批次大小32
最大迭代数量60

Fig.3

Chinese case data collection platform"

Table 2

Distribution of Chinese case entities"

实体训练集测试集
人名(Name)967290
症状(Symptom)3 6841 105
部位(Body)613184
检查(Test)2 846854
药物(Drug)1 057317

Table 3

BIO labeling criteria"

标注名称描述
B实体的开始
I实体的内部
E实体的结束
O非实体

Fig.4

Chinese case entity recognition results"

Fig.5

Coverage of entity label annotations"

Fig.6

Ablation test method described in this paper"

[1] 王颖洁, 张程烨, 白凤波, 等. 中文命名实体识别研究综述[J]. 计算机科学与探索, 2023, 17(2): 324-341.
Wang Ying-jie, Zhang Cheng-ye, Bai Feng-bo, et al. Review of Chinese named entity recognition research[J]. Journal of Frontiers of Computer Science & Technology, 2023, 17(2): 324-341.
[2] 赵继贵, 钱育蓉, 王魁, 等. 中文命名实体识别研究综述[J]. 计算机工程与应用, 2024, 60(1): 15-27.
Zhao Ji-gui, Qian Yu-rong, Wang Kui, et al. Survey of Chinese named entity recognition research[J]. Computer Engineering and Applications, 2024, 60(1): 15-27.
[3] 卢青华, 袁丽娜. 基于组合神经网络的软件命名实体识别仿真[J]. 计算机仿真, 2023, 40(1): 489-492, 509.
Lu Qing-hua, Yuan li-na. Software named entity recognition simulation based on combined neural network[J]. Computer Simulation, 2023, 40(1): 489-492, 509.
[4] 康怡琳, 孙璐冰, 朱容波, 等. 深度学习中文命名实体识别研究综述[J]. 华中科技大学学报: 自然科学版, 2022, 50(11): 44-53.
Kang Yi-lin, Sun Lu-bing, Zhu Rong-bo, et al. Survey on Chinese named entity recognition with deep learning [J]. Journal of Huazhong University of Science and Technology (Natural Science Edition), 2022, 50(11): 44-53.
[5] 张昀, 黄橙, 张玉瑶, 等. 面向少量标注数据的中文命名实体识别[J]. 中文信息学报, 2023, 37(3): 101-111.
Zhang Yun, Huang Cheng, Zhang Yu-yao, et al. Chinese named entity recognition with few labeled data[J]. Journal of Chinese Information Processing, 2023, 37(3): 101-111.
[6] 李健, 熊琦, 胡雅婷, 等. 基于Transformer和隐马尔科夫模型的中文命名实体识别方法[J]. 吉林大学学报: 工学版, 2023, 53(5): 1427-1434.
Li Jian, Xiong Qi, Hu Ya-ting, et al. Chinese named entity recognition method based on Transformer and hidden Markov model [J]. Journal of Jilin University (Engineering and Technology Edition), 2023, 53(5): 1427-1434.
[7] Jeon K, Lee G, Yang S, et al. Named entity recognition of building construction defect information from text with linguistic noise[J]. Automation in Construction, 2022, 143: No.104543.
[8] 方红, 苏铭, 冯一铂, 等. 结合gazetteers和句法依存树的中文命名实体识别[J]. 计算机工程与应用, 2022, 58(18): 227-232.
Fang Hong, Su Ming, Feng Yi-bo, et al. Chinese named entity recognition combined with gazetteers and syntactic dependency tree[J]. Computer Engineering and Applications, 2022, 58(18): 227-232.
[9] 廖梦, 贾真, 李天瑞. 基于标签信息融合与多任务学习的中文命名实体识别[J]. 计算机科学, 2024, 51(3): 198-204.
Liao Meng, Jia Zhen, Li Tian-rui. Chinese named entity recognition based on label information fusion and Multi-task learning[J]. Computer Science, 2024, 51(3): 198-204.
[10] 陈威达, 王林飞, 陶大鹏. 融合软注意力掩码嵌入的场景文本识别方法[J]. 中国图象图形学报, 2024,29(5): 1381-1391.
Chen Wei-da, Wang Lin-fei, Tao Da-peng. SAME-net:scene text recognition method based on soft attention mask embedding[J]. Journal of Image and Graphics, 2024, 29(5): 1381-1391.
[11] 廖列法, 谢树松. 基于注意力机制特征融合的中文命名实体识别[J]. 计算机工程, 2023, 49(4): 256-262.
Liao Lie-fa, Xie Shu-song. Chinese named entity recognition based on attention mechanism feature fusion[J]. Computer Engineering, 2023, 49(4): 256-262.
[12] 占文韬, 吴晓鸰, 凌捷. 基于多窗口注意力机制的中文命名实体识别[J]. 小型微型计算机系统, 2024,45(6): 1325-1330.
Zhan Wen-tao, Wu Xiao-ling, Ling Jie. Chinese named entity recognition based on multi-window attention mechanism[J]. Journal of Chinese Computer Systems, 2024, 45(6): 1325-1330.
[13] 赵丹丹, 黄德根, 孟佳娜, 等. 多头注意力与字词融合的中文命名实体识别[J]. 计算机工程与应用,2022, 58(7): 142-149.
Zhao Dan-dan, Huang De-gen, Meng Jia-na, et al. Chinese named entity recognition by integrating multi-heads attention mechanism and character and words fusion[J]. Computer Engineering and Applications, 2022, 58(7): 142-149.
[14] 李军怀, 陈苗苗, 王怀军, 等. 基于ALBERT-BGRU-CRF的中文命名实体识别方法[J]. 计算机工程, 2022, 48(6): 89-94, 106.
Li Jun-huai, Chen Miao-miao, Wang Huai-jun, et al. Chinese named entity recognition method based on ALBERT-BGRU-CRF[J]. Computer Engineering, 2022, 48(6): 89-94, 106.
[15] 张栋, 陈文亮. 基于上下文相关字向量的中文命名实体识别[J]. 计算机科学, 2021, 48(3): 233-238.
Zhang Dong, Chen Wen-liang. Chinese named entity recognition based on contextualized char embeddings[J]. Computer Science, 2021, 48(3): 233-238.
[1] Lei ZHANG,Jing JIAO,Bo-xin LI,Yan-jie ZHOU. Large capacity semi structured data extraction algorithm combining machine learning and deep learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(9): 2631-2637.
[2] Xiang-jiu CHE,Huan XU,Ming-yang PAN,Quan-le LIU. Two-stage learning algorithm for biomedical named entity recognition [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(8): 2380-2387.
[3] Jian LI,Qi XIONG,Ya-ting HU,Kong-yu LIU. Chinese named entity recognition method based on Transformer and hidden Markov model [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(5): 1427-1434.
[4] Xiao-ran GUO,Ping LUO,Wei-lan WANG. Chinese named entity recognition based on Transformer encoder [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(3): 989-995.
[5] Tao NI,Hai-qiang LIU,Lin-lin WANG,Shao-yuan ZOU,Hong-yan ZHANG,Ling-tao HUANG. Intelligent manipulation method of crane based on BiLSTM model [J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(2): 445-453.
[6] YAN Yang, WEN Dun-wei, WANG Yun-ji, WANG Ke. Named entity recognition in Chinese medical records based on cascaded conditional random field [J]. 吉林大学学报(工学版), 2014, 44(6): 1843-1848.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!