吉林大学学报(理学版) ›› 2020, Vol. 58 ›› Issue (2): 355-363.

• 计算机科学 • 上一篇    下一篇

基于维基百科信息框的本体信息提取

陈刚, 徐星羽   

  1. 武汉大学 国家网络安全学院, 武汉 430079
  • 收稿日期:2018-11-02 出版日期:2020-03-26 发布日期:2020-03-25
  • 通讯作者: 陈刚 E-mail:xxy_daniel@126.com

Ontology Information Extraction Based on Wikipedia Information Box

CHEN Gang, XU Xingyu   

  1. School of Cyber Science and Engineering, Wuhan University, Wuhan 430079, China
  • Received:2018-11-02 Online:2020-03-26 Published:2020-03-25
  • Contact: CHEN Gang E-mail:xxy_daniel@126.com

摘要: 针对传统方法在维基百科信息框中提取本体信息精准率较低的问题, 研究维基百科信息框中的属性结构化信息. 首先定义一组候选特征判定信息框属性之间的关系, 建立与类别、 列表、 文章及维基百科信息框模板之间的关联; 然后借鉴本体匹配方法提取维基百科信息框结构化信息, 计算属性对的相似度, 设置边界限制条件, 在达到一定精确度下构建本体结构描述属性之间的关系, 并构建类层次结构. 结果表明, 所给方法解决了提取本体信息精准率较低的问题, 能高效、 正确地在给定主题文章中将可能的属性结构提取出来, 并发现合理的类关系.

关键词: 维基百科, 信息框, 本体, 类层次

Abstract: Aiming at the problem of low accuracy of extracting ontology information from Wikipedia information box in traditional methods, we studied the attribute structured information in Wikipedia information box. Firstly, a set of candidate features was defined to determine the relationship between information box attributes, and the association with categories, lists, articles and Wikipedia information box templates was established. Secondly, using the method of ontology matching to extract the structured information of Wikipedia information box, calculate the similarity of attribute pairs, set the boundary constraints, and construct ontology structure to explain the relationship between attributes and construct a class hierarchy with a certain accuracy. The results show that the proposed method solves the problem of low accuracy of extracting ontology inform
ation, and can extract the possible attribute structure in a given topic article effectively and correctly, and find the reasonable class relationship.

Key words:  , Wikipedia, information box, ontology, class hierarchy

中图分类号: 

  • TP391