吉林大学学报(工学版) ›› 2020, Vol. 50 ›› Issue (2): 685-691.doi: 10.13229/j.cnki.jdxbgxb20180791

• 计算机科学与技术 • 上一篇    

知识图谱嵌入中的自适应筛选

欧阳丹彤1,2(),马骢1,2,雷景佩1,2(),冯莎莎1,2   

  1. 1.吉林大学 计算机科学与技术学院,长春 130012
    2.吉林大学 符号计算与知识工程教育部重点实验室,长春 130012
  • 收稿日期:2018-07-29 出版日期:2020-03-01 发布日期:2020-03-08
  • 通讯作者: 雷景佩 E-mail:ouyd@jlu.edu.cn;378666306@qq.com
  • 作者简介:欧阳丹彤(1968-),女,教授,博士生导师.研究方向:基于模型诊断,语义网.E-mail: ouyd@jlu.edu.cn
  • 基金资助:
    国家自然科学基金项目(61872159)

Knowledge graph embedding with adaptive sampling

Dan-tong OUYANG1,2(),Cong MA1,2,Jing-pei LEI1,2(),Sha-sha FENG1,2   

  1. 1.College of Computer Science and Technology, Jilin University, Changchun 130012, China
    2.Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
  • Received:2018-07-29 Online:2020-03-01 Published:2020-03-08
  • Contact: Jing-pei LEI E-mail:ouyd@jlu.edu.cn;378666306@qq.com

摘要:

针对知识图谱数据类别不平衡与训练难度不同,随机进行训练数据采样可能导致嵌入模型不能快速收敛的问题,提出了一种自适应的筛选训练数据方法。对训练数据按照关系类别进行分组,采样过程中首先根据概率选择关系类别,然后从选定的分组中随机选择一个实例进行训练。根据训练效果对每组实例被选择的概率进行自适应调整。实验结果表明:自适应的分组筛选在链接预测任务上取得了更好的结果,使嵌入模型更快、更好地收敛。

关键词: 人工智能, 知识图谱嵌入, 基于翻译的嵌入模型, 自适应筛选, 链接预测

Abstract:

Due to the imbalance of KG data and the difficulty of training, that random sampling of training data may make it difficult for embedded models to converge rapidly. Therefore, in this paper, an adaptive method for sampling of training data is proposed. The training data are grouped according to the different relationships. In the sampling process, a group is determined according to the probability, and then an instance is randomly selected from the determined group for training. At the same time, according to the training effect, the probability of each selected instance is adjusted adaptively. Experimental results show that adaptive grouping filter achieves better results in link prediction tasks, and enables the embedded model to converge faster and better.

Key words: artificial intelligence, knowledge graph embedding, translation-based embedding models, adaptive sampling, link prediction

中图分类号: 

  • TP391

表1

数据集"

数据集实体数关系数训练集验证集测试集
FB15k14 9511 345483 14250 00059 071
WN1840 94318141 4425 0005 000
FB15k-23714 541237272 11517 53520 466

表2

FB15k、WN18和FB15k -237的Mean Rank的filt结果 "

数据集MetricTransEASTTransE_NZLAST_NZL
FB15kMean Rank142134144117
Hits@100.7140.7160.7330.795
WN18Mean Rank490457456425
Hits@100.9320.9390.9260.946
FB15k-237Mean Rank252255319308
Hits@100.4220.4230.4430.458

图1

FB15k、WN18和FB15k -237的Mean Rank结果 "

图2

FB15k、WN18和FB15k -237的Hits@10结果 "

1 Bollacker K, Evans C, Paritosh P, et al. Freebase:a collaboratively created graph database for structuring human knowledge[C]∥ Proceedings of the SIGMOD Conference, Vancouver, Canada, 2008: 1247- 1250.
2 Miller G A. WordNet: a lexical database for English[J]. Communications of the Acm, 1995, 38( 11): 39- 41.
3 Lehmann J, Isele R, Jakob M, et al. DBpedia-a large-scale, multilingual knowledge base extracted from wikipedia[J]. Semantic Web, 2015, 6( 2): 167- 195.
4 Daiber J, Jakob M, Hokamp C, et al. Improving efficiency and accuracy in multilingual entity extraction[C]∥ Proceedings of the 9th International Conference on Semantic Systems, Graz, Austria, 2013: 121- 124.
5 Zhang Y, Dai H, Kozareva Z, et al. Variational reasoning for question answering with knowledge graph[C]∥ Proceedings of the 32nd AAAI, New Orleans, 2018: 6069- 6076.
6 Wang Q, Mao Z, Wang B, et al. Knowledge graph embedding: a survey of approaches and applications[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29( 12), 2724- 2743.
7 刘知远, 孙茂松, 林衍凯, 等. 知识表示学习研究进展[J]. 计算机研究与发展, 2016, 53( 2): 247- 261.
Liu Zhi-yuan, Sun Mao-song, Lin Yan-kai, et al. Knowledge representation learning: a review[J]. Journal of Computer Research and Development, 2016, 53 ( 2): 247- 261.
8 Bordes A, Usunier N, Garcia-Duran A, et al. Translating embeddings for modeling multi-relational data[C]∥ Proceedings of the 27th Annual Conference on Neural Information Processing System, Lake Tahoe, 2013: 2787- 2795.
9 Wang Z, Zhang J, Feng J, et al. Knowledge graph embedding by translating on hyperplanes[C]∥ Proceedings of the 28th AAAI Conference on Artificial Intelligence, Québec City, Canada, 2014: 1112- 1119.
10 Lin Y, Liu Z, Zhu X, et al. Learning entity and relation embeddings for knowledge graph completion[C]∥ Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, Texas, USA, 2015: 2181- 2187.
11 Ji G, He S, Xu L, et al. Knowledge graph embedding via dynamic mapping matrix[C]∥ Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, Beijing, China, 2015: 687- 696.
12 Ji G, Liu K, He S, et al. Knowledge graph completion with adaptive sparse transfer matrix[C]∥ Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, Arizona, USA, 2016: 985- 991.
13 Liu H, Wu Y, Yang Y. Analogical inference for multi-relational embeddings[C]∥ Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 2017: 2168- 2178.
14 Wang P, Li S, Pan R. Incorporating GAN for negative sampling in knowledge representation learning[C]∥ Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, 2018: 2005- 2012.
15 Liu H, Wu Y, Yang Y. Adaptive sampling for SGD by exploiting side information[C]∥ Proceedings of the 33rd International Conference on Machine Learning, New York, 2016: 364- 372.
16 刘峤, 韩明皓, 杨晓慧, 等. 基于表示学习和语义要素感知的关系推理算法[J]. 计算机研究与发展, 2017, 54( 8): 1682- 1692.
Liu Qiao, Han Ming-hao, Yang Xiao-hui, et al. Representation learning based relational inference algorithm with semantical aspect awareness[J]. Journal of Computer Research and Development, 2017, 54( 8): 1682- 1692.
[1] 李贻斌,郭佳旻,张勤. 人体步态识别方法与技术[J]. 吉林大学学报(工学版), 2020, 50(1): 1-18.
[2] 徐谦,李颖,王刚. 基于深度学习的行人和车辆检测[J]. 吉林大学学报(工学版), 2019, 49(5): 1661-1667.
[3] 高万夫,张平,胡亮. 基于已选特征动态变化的非线性特征选择方法[J]. 吉林大学学报(工学版), 2019, 49(4): 1293-1300.
[4] 欧阳丹彤,肖君,叶育鑫. 基于实体对弱约束的远监督关系抽取[J]. 吉林大学学报(工学版), 2019, 49(3): 912-919.
[5] 顾海军, 田雅倩, 崔莹. 基于行为语言的智能交互代理[J]. 吉林大学学报(工学版), 2018, 48(5): 1578-1585.
[6] 董飒, 刘大有, 欧阳若川, 朱允刚, 李丽娜. 引入二阶马尔可夫假设的逻辑回归异质性网络分类方法[J]. 吉林大学学报(工学版), 2018, 48(5): 1571-1577.
[7] 王旭, 欧阳继红, 陈桂芬. 基于垂直维序列动态时间规整方法的图相似度度量[J]. 吉林大学学报(工学版), 2018, 48(4): 1199-1205.
[8] 张浩, 占萌苹, 郭刘香, 李誌, 刘元宁, 张春鹤, 常浩武, 王志强. 基于高通量数据的人体外源性植物miRNA跨界调控建模[J]. 吉林大学学报(工学版), 2018, 48(4): 1206-1213.
[9] 李雄飞, 冯婷婷, 骆实, 张小利. 基于递归神经网络的自动作曲算法[J]. 吉林大学学报(工学版), 2018, 48(3): 866-873.
[10] 刘杰, 张平, 高万夫. 基于条件相关的特征选择方法[J]. 吉林大学学报(工学版), 2018, 48(3): 874-881.
[11] 黄岚, 纪林影, 姚刚, 翟睿峰, 白天. 面向误诊提示的疾病-症状语义网构建[J]. 吉林大学学报(工学版), 2018, 48(3): 859-865.
[12] 王旭, 欧阳继红, 陈桂芬. 基于多重序列所有公共子序列的启发式算法度量多图的相似度[J]. 吉林大学学报(工学版), 2018, 48(2): 526-532.
[13] 刘雪娟, 袁家斌, 许娟, 段博佳. 量子k-means算法[J]. 吉林大学学报(工学版), 2018, 48(2): 539-544.
[14] 杨欣, 夏斯军, 刘冬雪, 费树岷, 胡银记. 跟踪-学习-检测框架下改进加速梯度的目标跟踪[J]. 吉林大学学报(工学版), 2018, 48(2): 533-538.
[15] 李嘉菲, 孙小玉. 基于谱分解的不确定数据聚类方法[J]. 吉林大学学报(工学版), 2017, 47(5): 1604-1611.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!