Journal of Jilin University(Engineering and Technology Edition) ›› 2021, Vol. 51 ›› Issue (4): 1358-1363.doi: 10.13229/j.cnki.jdxbgxb20200197

Previous Articles    

Automatic construction of knowledge graph based on massive text data

Xiao-long ZHU1,2(),Zhong XIE1()   

  1. 1.School of Geography and Information Engineering,China University of Geosciences,Wuhan 430074,China
    2.College of Geoscience,Yangtze University,Wuhan 430100,China
  • Received:2020-03-30 Online:2021-07-01 Published:2021-07-14
  • Contact: Zhong XIE E-mail:zxlong0224@tom.com

Abstract:

In the process of constructing the knowledge graph, the existing method ignores the processing of semi-structured data, which leads to the inaccuracy and time-consuming in construction of the knowledge graph. Therefore, an automatic knowledge graph construction algorithm based on massive text data is proposed. A triplet extractor is used to extract massive text data sources, and to extract semi-structured data, while eliminating redundant data. According to the data processing results, the appropriate data objects are selected using the data collection function as the text data source constructed by the knowledge map. The data source is subjected to standardized processing such as text format conversion, word segmentation and feature extraction. The underlying semantics of the data are analyzed and an XTM visualization map is drawn to form a preliminary knowledge map. The triples of users, ratings and items are composed by mining the existing knowledge in this knowledge map, applying potential vectors to information recommendation, and the graph evolution algorithm is used to predict the ratings, users and items, constructing latent vector models Domain recommendation to realize the automatic evolution of the knowledge graph. Experimental results show that the algorithm has higher construction accuracy and less time consumption, which shows that the algorithm is reliable and practical.

Key words: massive text data, knowledge map, triples extractor, format conversion, feature extraction

CLC Number: 

  • TP311

Fig.1

Triplet extractor framework"

Fig.2

Construction process of knowledge graph"

Fig.3

Data preprocessor"

Fig.4

Data analysis procedure"

Fig.5

XTM protocol cluster"

Fig.6

Experimental operation environment"

Fig.7

Construction of knowledge map and prediction of evolution accuracy"

1 吴雪峰, 赵志凯, 王莉, 等. 煤矿巷道支护领域知识图谱构建[J]. 工矿自动化, 2019, 45(6): 42-46.
Wu Xue-feng, Zhao Zhi-kai, Wang Li, et al. Construction of knowledge graph of coal mine roadway support field[J]. Industry and Mine Automation, 2019, 45(6): 42-46.
2 陈亚东, 鲜国建, 寇远涛, 等. 我国苹果产业知识图谱构建研究[J]. 中国农业资源与区划, 2017, 38(11):40-45.
Chen Ya-dong, Xian Guo-jian, Kou Yuan-tao. et al. Study on construction of knowledge graph of apple industry in China[J]. Chinese Journal of Agricultural Resources and Regional Planning, 2017, 38(11):40-45.
3 段鹏飞, 王远, 熊盛武, 等. 基于空间投影和关系路径的地理知识图谱表示学习[J]. 中文信息学报, 2018, 32(3): 26-33.
Duan Peng-fei, Wang Yuan, Xiong Sheng-wu, et al. Space projection and relation path based representation learning for construction of geography knowledge graph[J]. Journal of Chinese Information Processing, 2018, 32(3): 26-33.
4 孙昊天, 杨良斌. 基于带权三元闭包的知识图谱的构建方法研究[J]. 情报杂志, 2019, 38(6):168-173.
Sun Hao-tian, Yang Liang-bin. Research on the construction method of knowledge graph based on weighted triadic closure[J]. Journal of Intelligence, 2019, 38(6):168-173.
5 王坤, 谢振平, 陈梅婕. 基于图约简的知识联想关系网络建模[J]. 智能系统学报, 2019, 14(4):679-688.
Wang Kun, Xie Zhen-ping, Chen Mei-jie. Modeling knowledge network on associative relations based on graph reduction[J]. CAAI Transactions on Intelligent Systems, 2019, 14(4):679-688.
6 张仲伟, 曹雷, 陈希亮, 等. 基于神经网络的知识推理研究综述[J]. 计算机工程与应用, 2019, 55(12):8-19.
Zhang Zhong-wei, Cao Lei, Chen Xi-liang, et al. Survey of knowledge reasoning based on neural network[J]. Computer Engineering and Applications, 2019, 55(12):8-19.
7 余传明, 王峰, 安璐. 基于深度学习的领域知识对齐模型研究:知识图谱视角[J]. 情报学报, 2019, 38(6):641-654.
Yu Chuan-ming, Wang Feng, An Lu. Research on the domain knowledge alignment model based on deep learning: the knowledge graph perspective[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(6):641-654.
8 陈国龙, 於志勇, 马飞翔, 等. 基于知识图谱的文本观点检索方法[J]. 山东大学学报:理学版, 2016, 51(11): 33-40.
Chen Guo-long, Yu Zhi-yong, Ma Fei-xiang, et al. A text opinion retrieval method based on knowledge graph[J]. Journal of Shandong University (Natural Science), 2016, 51(11): 33-40.
9 长青, 王鼎, 徐立丽, 等. 国内区域创新理论热点演进及前沿趋势研究——基于知识图谱视角[J]. 科技管理研究, 2016, 36(18):81-86.
Chang Qing, Wang Ding, Xu Li-li, et al. Study of hotspots evaluation and frontier trends of regional innovation theory in china—based on knowledge mapping domain[J]. Science and Technology Management Research, 2016, 36(18):81-86.
10 丁连红, 孙斌, 时鹏. 知识图谱复杂网络特性的实证研究与分析[J]. 物理学报, 2019, 68(12): 318-332.
Ding Lian-hong, Sun Bin, Shi Peng. Empirical study of knowledge network based on complex network theory[J]. Acta Physica Sinica, 2019, 68(12): 318-332.
11 孟小冬. 大数据背景下链路网络敏感数据防窃取方法[J]. 西安工程大学学报, 2019, 32(2):212-217.
Meng Xiao-dong. Anti-theft method of sensitive data in link network in large data background[J]. Journal of Xi'an Polytechnic University, 2019, 32(2):212-217.
[1] Tao XU,Ke MA,Cai-hua LIU. Multi object pedestrian tracking based on deep learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(1): 27-38.
[2] Yi-bin LI,Jia-min GUO,Qin ZHANG. Methods and technologies of human gait recognition [J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(1): 1-18.
[3] GENG Qing-tian, YU Fan-hua, WANG Yu-ting, GAO Qi-kun. New algorithm for vehicle type detection based on feature fusion [J]. 吉林大学学报(工学版), 2018, 48(3): 929-935.
[4] DONG Qiang, LIU Jing-hong, ZHOU Qian-fei. Improved SURF algorithm used in image mosaic [J]. 吉林大学学报(工学版), 2017, 47(5): 1644-1652.
[5] YIN Ming, ZHAN Yin-wei, PEI Hai-long. Co-sparse analysis operator learning for image fusion [J]. 吉林大学学报(工学版), 2016, 46(6): 2052-2058.
[6] XIAO Zhong-jie. Recognition of digital image based on wavelet space feature spectrum entropy [J]. 吉林大学学报(工学版), 2015, 45(6): 1994-1998.
[7] LIU Hong,SUN Shuang-zi,WANG Qing-yuan,LI Yan-zhong. PSO based feature extraction method for analog circuit fault information [J]. 吉林大学学报(工学版), 2015, 45(2): 675-680.
[8] PAN Hai-yang, LIU Shun-an, YAO Yong-ming. Depth information-basd autonomous aerial refueling [J]. 吉林大学学报(工学版), 2014, 44(6): 1750-1756.
[9] GU Bo-yu,SUN Jun-xi,LI Hong-zuo,LIU Hong-xi,LIU Guang-wen. Face recognition based on eigen weighted modular two-directional two-dimensional PCA [J]. 吉林大学学报(工学版), 2014, 44(3): 828-833.
[10] WANG Zhuo-zheng, JIA Ke-bin. Application of matrix completion and principal component analysis to corrupted image registration [J]. 吉林大学学报(工学版), 2013, 43(增刊1): 78-83.
[11] SHI Dong-cheng, ZHENG Chao. Eye states detection algorithm based on phase information [J]. 吉林大学学报(工学版), 2013, 43(增刊1): 265-269.
[12] ZHAO Hong-wei, CHEN Xiao, LONG Man-li, PEI Shi-hui. Image edge detection based on Riesz transformation [J]. 吉林大学学报(工学版), 2013, 43(增刊1): 133-137.
[13] ZHANG Xu, GUO Bao-long, MENG Fan-jie, SUN Wei. Image retrieval based on IPDSH and region division [J]. 吉林大学学报(工学版), 2013, 43(05): 1408-1414.
[14] LIU Shao-gang, GUO Yun-long, JIA He-ming. Rescue robot simultaneous localization and mapping based on extraction and matching of line features [J]. 吉林大学学报(工学版), 2013, 43(04): 1035-1044.
[15] DAI Jin-bo, XIAO Xiao, ZHAO Hong-wei. Human face recognition based on low resolution local binary pattern [J]. 吉林大学学报(工学版), 2013, 43(02): 435-438.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!