基于XLM-RoBERTa-Large-Finetuned-Conll03-English模型结合CRF的中文命名实体识别微调优化方法

吉林大学学报(理学版) ›› 2026, Vol. 64 ›› Issue (2): 370-0376.

基于XLM-RoBERTa-Large-Finetuned-Conll03-English模型结合CRF的中文命名实体识别微调优化方法

廉雄杰, 董振

延边大学信息化中心, 吉林延吉 133002

收稿日期:2024-09-11 出版日期:2026-03-26 发布日期:2026-03-26
通讯作者: 董振 E-mail:dongzhenvip2024@163.com

Fine-Tuning Optimization Method of Chinese Named Entity Recognition Based on XLM-RoBERTa-Large-Finetuned-Conll03-English Model Combined with CRF

LIAN Xiongjie, DONG Zhen

Center of Information Technology, Yanbian University, Yanji 133002, Jilin Province, China

Received:2024-09-11 Online:2026-03-26 Published:2026-03-26

摘要/Abstract

摘要： 针对中文中词与词之间无明显的空格分隔, 导致词汇边界不明确, 难以准确捕捉实体与周围词的关系, 从而使中文命名实体识别准确率较低的问题, 提出一种基于XLM-RoBERTa-Large-Finetuned-Conll03-English模型并结合条件随机场(CRF)的中文命名实体识别微调优化方法. 首先, 建立中文命名实体指示词库, 确定命名实体范围并对实体排序, 利用概率计算获取命名实体的最优特征; 其次, 将CRF获取的特征引入到XLM-RoBERTa-Large-Finetune-Conll03-English模型中, 捕捉命名实体特征序列及序列的依赖关系; 最后, 通过在多语言模型上添加CRF层实现对中文命名实体识别的微调优化. 实验结果表明, 该微调优化方法显著提升了中文命名实体识别性能, 使模型有更高的准确率和更低的损失值, 在中文命名实体识别任务中适用性更好.

关键词: XLM-RoBERTa模型, 命名实体识别, 微调优化, 条件随机场

Abstract: Aiming at the problem that there was no obvious space separation between words in Chinese, which led to unclear vocabulary boundaries, and it was difficult to accurately capture the relationship between entities and surrounding words, resulting in low accuracy of Chinese named entity recognition, we proposed a fine-tuning optimization method of Chinese named entity
recognition based on XLM-RoBERTa-Large-Finetuned-Conll03-English model combined with conditional random field (CRF). Firstly, we established a Chinese named entity indicator lexicon, determined the scope of named entities, sorted the entities, and used probability calculation to obtain the optimal features of named entities. Secondly, we introduced the features obtained by CRF into the XLM-RoBERTa-Large-Finetuned-Conll03-English model to capture the feature sequences of named entities and their dependencies. Finally, by adding CRF layer to the multi-language model, the fine-tuning optimization of Chinese named entity recognition was realized. The experimental results show that this fine-tuning optimization method significantly improves the performance of Chinese named entity recognition, enabling the model to have higher accuracy and lower loss value, and better applicability in Chinese named entity recognition (NER) task.

Key words: XLM-RoBERTa model, named entity recognition, fine-tuning optimization, conditional random field

中图分类号:

TP391.1

廉雄杰, 董振. 基于XLM-RoBERTa-Large-Finetuned-Conll03-English模型结合CRF的中文命名实体识别微调优化方法[J]. 吉林大学学报(理学版), 2026, 64(2): 370-0376.

LIAN Xiongjie, DONG Zhen. Fine-Tuning Optimization Method of Chinese Named Entity Recognition Based on XLM-RoBERTa-Large-Finetuned-Conll03-English Model Combined with CRF[J]. Journal of Jilin University Science Edition, 2026, 64(2): 370-0376.