吉林大学学报(信息科学版) ›› 2025, Vol. 43 ›› Issue (5): 1119-1127.

• • 上一篇    下一篇

学术文本关键词文库知识图谱实体关系抽取算法

王 哲1,2, 刘 欢3, 梁培玮3   

  1. 1. 华南理工大学 计算机科学与工程学院,广州510000;2. 中国南方电网有限责任公司,广州510663; 3. 南方电数字企业科技(广东)有限公司,广州510030
  • 收稿日期:2023-11-16 出版日期:2025-09-28 发布日期:2025-11-20
  • 作者简介:王哲(1990— ), 男, 广东湛江人, 华南理工大学工程师, 主要从事档案智能化、 电力信息系统建设研究, (Tel)86- 13424450746(E-mail)wzd57845@ yeah. net。
  • 基金资助:
    中国南方电网有限责任公司科技基金资助项目(202200GX023)

Algorithm for Extracting Entity Relationships from Knowledge Graph of Academic Text Keyword Library

WANG Zhe1,2, LIU Huan3, LIANG Peiwei3   

  1. 1. School of Computer Science and Engineering, South China University of Technology, Guangzhou 510000, China; 2. China Southern Power Grid Company Limited, Guangzhou 510663, China; 3. Southern Power Grid Digital Enterprise Technology Guangdong Company Limited, Guangzhou 510030, China
  • Received:2023-11-16 Online:2025-09-28 Published:2025-11-20

摘要: 为了在海量文库知识图谱中快速提取出关键信息, 提出学术文本关键词文库知识图谱实体关系抽取算法。 通过优化完整策略的模糊C均值聚类(OCS-FCM:Optimization of Complete Strategy Fuzzy C-Means)和弹性嵌入t-分布随机邻域(E-t-SNE: Elastic-embedding t-Distributed Stochastic Neighbor Embedding)算法分别对文库中的关键词实施缺失值填补和降维,以学术文本关键词文库中的实体作为顶点,建立知识图谱。 根据关键词的词性等特征, 基于自注意力机制算法构建自注意力双向长短期记忆网络(SelfATT-BLSTM: Self-Attention Bidirectional Long Short-Term Memory)模型对知识图谱中的实体关系进行抽取, 并获取实体抽取后的结果。 实验结果表明,所提算法的采集精度始终在0.8以上,准确率(ACC:Accuracy)值高于30%,抽取时间未超过1.5 s, 具有良好的实体关系抽取能力。 在实体抽取过程中拥有极高的准确度和效率。

关键词: OCS-FCM算法, 数据缺失值填补, 知识图谱, SelfATT-BLSTM模型

Abstract:  In order to quickly extract key information from massive library knowledge graphs, an entity relationship extraction algorithm for academic text keyword library knowledge graphs is proposed. OCS-FCM (Optimization of Complete Strategy Fuzzy C-Means) and Elastic E-t-SNE(Embedding t-Distributed Stochastic Neighbor Embedding ) algorithms are used to perform missing value filling and dimensionality reduction on key words in the library. And using entities in the academic text keyword library as vertices, a knowledge graph is established. Based on the part of speech and other features of keywords, a SelfATT BLSTM(Self Attention Bidirectional Long Short Term Memory) model is constructed using a self attention mechanism algorithm to extract entity relationships from the knowledge graph and obtain the extracted results. Experimental results have shown that the collection accuracy of proposed algorithm is more than 0. 8, with an ACC(Accuracy) value over 30% and a extraction time less than 1.5 s, demonstrating excellent ability to extract entity relationships. 

Key words: optimization of complete strategy fuzzy C-Means(OCS-FCM) algorithm, filling in missing data values, knowledge graph, self-attention bidirectional long short-term memory ( SelfATT- BLSTM) model

中图分类号: 

  • TP391