吉林大学学报(工学版) ›› 2022, Vol. 52 ›› Issue (6): 1428-1433.doi: 10.13229/j.cnki.jdxbgxb20210443

• 计算机科学与技术 • 上一篇    

考虑数据稀疏的知识图谱缺失连接自动补全算法

王文军1(),余银峰2,3,4   

  1. 1.山西大同大学 计算机与网络工程学院,山西 大同 037009
    2.清华大学 计算机科学与技术系,北京 100084
    3.清华大学 智能技术与系统国家重点实验室,北京 100084
    4.新疆大学 信息科学与工程学院,乌鲁木齐 830046
  • 收稿日期:2021-05-19 出版日期:2022-06-01 发布日期:2022-06-02
  • 作者简介:王文军(1977-),男,副教授,硕士. 研究方向:知识图谱,数据挖掘.E-mail:wangwenjun96966@126.com
  • 基金资助:
    大同市科技局软科学项目(2020179);大同市科技局重点研发(高新技术领域)项目(2020014);新疆维吾尔自治区自然科学基金项目(2020D01C026)

Automatic completion algorithm for missing links in nowledge graph considering data sparsity

Wen-jun WANG1(),Yin-feng YU2,3,4   

  1. 1.College of Computer and Network Engineering,Shanxi Datong University,Datong 037009,China
    2.Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China
    3.State Key Laboratory of Intelligent Technology and Systems,Tsinghua University,Beijing 100084,China
    4.School of Information Science and Engineering,Xinjiang University,Urumqi 830046,China
  • Received:2021-05-19 Online:2022-06-01 Published:2022-06-02

摘要:

针对数据稀疏的云冈石窟语料库,通过人工对语义规则的认识实现数据稀疏的补全,一定程度上影响了后续工作的准确率,为此,本文提出了考虑数据稀疏的知识图谱缺失连接自动补全算法。通过设置数据的邻域结构,构建了基于数据稀疏的知识图谱嵌入表示模型,用于抽取数据关系,采用长短时记忆网络模型自动补全变长序列中的缺失数据,实现了知识图谱缺失连接的自动补全。最后,将该算法应用于云冈石窟知识图谱的构建,从实验结果可以看出,在同一数据库集上,考虑数据稀疏的知识图谱缺失连接自动补全算法的准确率高达95.4%,远高于其他传统补全算法。

关键词: 计算机应用, 数据稀疏, 知识图谱, 补全算法, 实体关系

Abstract:

Aiming at the Yungang Grottoes corpus with sparse data, the completion of sparse data is realized through the artificial understanding of semantic rules, which affects the accuracy of the follow-up work to a certain extent. Therefore, an automatic completion algorithm for missing connection of knowledge atlas considering data sparsity is proposed, and some corresponding contents are also proposed in the text. By setting the neighborhood structure of the data, the knowledge graph embedded representation model based on data sparsity is constructed to extract the unknown relationship of the data. Then, the long-short-term memory network model is used to automatically complete the missing data in the variable-length sequence, and the automatic completion of the missing connection of the knowledge graph is realized. Finally, the algorithm is applied to the construction of the knowledge graph of Yungang Grottoes, it can be seen from the experimental results that on the same database set, considering the data sparsity, the accuracy of the algorithm is up to 95.4%,much higher than other traditional algorithms.

Key words: computer application, data sparsity, knowledge graph, completion algorithm, entity relationship

中图分类号: 

  • TP391.1

图1

实体向量结构构建流程"

图2

图注意力网络模型整体结构图"

图3

知识图谱中关系抽取模型结构图"

表1

实验数据集相关参数值"

参数数值参数数值
实体关系数量24 987路径平均长度4.5
查询关系数量32数据缺失/个320 000
实体数389 115实验数据/个640 000
实体属性数2 137

表2

不同补全算法的实验结果对比"

算 法准确率/%回收率/%F1-score
本文95.410.90.0277
基于LFM的传统方法79.77.90.0253
基于SVD的传统方法80.26.90.0231
基于UserCF的传统方法81.19.70.0195
1 赵一鸣, 吴林容, 任笑笑. 基于多知识图谱的中文文本语义图构建研究[J]. 情报科学, 2021, 39(4): 23-29.
Zhao Yi-ming, Wu Lin-rong, Ren Xiao-xiao. Chinese text semantic graph construction based on multiple knowledge graphs[J]. Information Science, 2021, 39(4): 23-29.
2 王娜娜. 混合云存储中网络稀疏大数据渗透迁移算法[J]. 计算机工程与设计, 2021, 42(3): 719-725.
Wang Na-na. Network sparse big data infiltration migration algorithm in hybrid cloud storage[J]. Computer Engineering and Design, 2021, 42(3): 719-725.
3 翟社平, 郭琳, 高山, 等. 一种采用贝叶斯推理的知识图谱补全方法[J]. 小型微型计算机系统, 2018, 39(5): 995-999.
Zhai She-ping, Guo Lin, Gao Shan, et al. Method for knowledge graph completion based on Bayesian reasoning[J]. Journal of Chinese Computer Systems, 2018, 39(5): 995-999.
4 Goel R, Kazemi S M, Brubaker M, et al. Diachronic embedding for temporal knowledge graph completion[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(4): 3988-3995.
5 Zhang Z, Cai J, Wang J. Duality-induced regularizer for tensor factorization based knowledge graph completion[J]. Advances in Neural Information Processing Systems, 2020, 33(2): 21604-21615.
6 王子涵, 邵明光, 刘国军, 等. 基于实体相似度信息的知识图谱补全算法[J].计算机应用, 2018, 38(11): 3089-3093.
Wang Zi-han, Shao Ming-guang, Liu Guo-jun, et al. Knowledge graph completion algorithm based on similarity between entities[J]. Journal of Computer Applications, 2018, 38(11): 3089-3093.
7 Shen Y, Ding N, Zheng H T, et al. Modeling relation paths for knowledge graph completion[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 33(11): 3607-3617.
8 陈锦霞, 张婷. 基于数据稀疏特征的交互设计智能推送仿真[J]. 计算机仿真, 2020, 37(12): 166-170.
Chen Jin-xia, Zhang Ting. Intelligent push simulation of interactive design based on data sparse feature[J]. Computer Simulation, 2020, 37(12): 166-170.
9 张天杭, 李婷婷, 张永刚. 基于知识图谱嵌入的多跳中文知识问答方法[J]. 吉林大学学报:理学版, 2022, 60(1): 119-126.
Zhang Tian-hang, Li Ting-ting, Zhang Yong-gang. Multi-hop chinese knowledge question answering method based on knowledge graph embedding[J]. Journal of Jilin University(Science Edition), 2022, 60(1): 119-126.
10 潘承瑞, 何灵敏, 胥智杰, 等. 融合知识图谱的双线性图注意力网络推荐算法[J].计算机工程与应用, 2021, 57(1): 29-37.
Pan Cheng-rui, He Ling-min, Xu Zhi-jie, et al. Fusion knowledge graph and bilinear graph attention network recommendation algorithm[J]. Computer Engineering and Applications, 2021, 57(1): 29-37.
11 陆万荣, 许江淳, 李玉惠. 考虑边界稀疏样本的非平衡数据处理方法[J]. 重庆邮电大学学报: 自然科学版, 2020, 32(3): 495-502.
Lu Wan-rong, Xu Jiang-chun, Li Yu-hui. Unbalanced data processing method considering boundary sparse samples[J]. Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition), 2020, 32(3): 495-502.
12 侯位昭, 张欣海, 宋凯磊, 等. 融合知识图谱及贝叶斯网络的智能推荐方法[J]. 中国电子科学研究院学报, 2020, 15(5): 488-494.
Hou Wei-zhao, Zhang Xin-hai, Song Kai-lei, et al. Intelligent recommendation method combining knowledge graph and Bayesian network[J]. Journal of China Academy of Electronics and Information Technology, 2020, 15(5): 488-494.
13 岳希, 唐聃, 舒红平, 等. 基于数据稀疏性的协同过滤推荐算法改进研究[J]. 工程科学与技术, 2020, 52(1): 198-202.
Yue Xi, Tang Dan, Shu Hong-ping, et al. Research on improvement of collaborative filtering recommendation algorithm based on data sparseness[J]. Advanced Engineering Sciences, 2020, 52(1): 198-202.
14 刘静, 刘涵, 黄开宇, 等. 基于自动秩估计的黎曼优化矩阵补全算法及其在图像补全中的应用[J]. 电子与信息学报, 2019, 41(11): 2787-2794.
Liu Jing, Liu Han, Huang Kai-yu, et al. Automatic rank estimation based Riemannian optimization matrix completion algorithm and application to image completion[J]. Journal of Electronics & Information Technology, 2019, 41(11): 2787-2794.
15 朱小龙, 谢忠. 基于海量文本数据的知识图谱自动构建算法[J].吉林大学学报:工学版, 2021, 51(4): 1358-1363.
Zhu Xiao-long, Xie Zhong. Automatic construction of know ledge graph based on massive text data[J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(4): 1358-1363.
[1] 康耀龙,冯丽露,张景安,陈富. 基于谱聚类的高维类别属性数据流离群点挖掘算法[J]. 吉林大学学报(工学版), 2022, 52(6): 1422-1427.
[2] 陈雪云,贝学宇,姚渠,金鑫. 基于G⁃UNet的多场景行人精确分割与检测[J]. 吉林大学学报(工学版), 2022, 52(4): 925-933.
[3] 方世敏. 基于频繁模式树的多来源数据选择性集成算法[J]. 吉林大学学报(工学版), 2022, 52(4): 885-890.
[4] 李大湘,陈梦思,刘颖. 基于STA⁃LSTM的自发微表情识别算法[J]. 吉林大学学报(工学版), 2022, 52(4): 897-909.
[5] 刘铭,杨雨航,邹松霖,肖志成,张永刚. 增强边缘检测图像算法在多书识别中的应用[J]. 吉林大学学报(工学版), 2022, 52(4): 891-896.
[6] 魏晓辉,苗艳微,王兴旺. Rhombus sketch:自适应和准确的流数据sketch[J]. 吉林大学学报(工学版), 2022, 52(4): 874-884.
[7] 王雪,李占山,吕颖达. 基于多尺度感知和语义适配的医学图像分割算法[J]. 吉林大学学报(工学版), 2022, 52(3): 640-647.
[8] 欧阳继红,郭泽琪,刘思光. 糖尿病视网膜病变分期双分支混合注意力决策网络[J]. 吉林大学学报(工学版), 2022, 52(3): 648-656.
[9] 毛琳,任凤至,杨大伟,张汝波. 双向特征金字塔全景分割网络[J]. 吉林大学学报(工学版), 2022, 52(3): 657-665.
[10] 王学智,李清亮,李文辉. 融合迁移学习的土壤湿度预测时空模型[J]. 吉林大学学报(工学版), 2022, 52(3): 675-683.
[11] 康苏明,张叶娥. 基于Hadoop的跨社交网络局部时序链路预测算法[J]. 吉林大学学报(工学版), 2022, 52(3): 626-632.
[12] 雷景佩,欧阳丹彤,张立明. 基于知识图谱嵌入的定义域值域约束补全方法[J]. 吉林大学学报(工学版), 2022, 52(1): 154-161.
[13] 曲优,李文辉. 基于锚框变换的单阶段旋转目标检测方法[J]. 吉林大学学报(工学版), 2022, 52(1): 162-173.
[14] 赵宏伟,霍东升,王洁,李晓宁. 基于显著性检测的害虫图像分类[J]. 吉林大学学报(工学版), 2021, 51(6): 2174-2181.
[15] 刘洲洲,张倩昀,马新华,彭寒. 基于优化离散差分进化算法的压缩感知信号重构[J]. 吉林大学学报(工学版), 2021, 51(6): 2246-2252.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!