吉林大学学报(信息科学版)

• 论文 • 上一篇    下一篇

特定领域概念属性关系抽取方法研究

王旭阳, 姜喜秋   

  1. 兰州理工大学 计算机与通信学院, 兰州 730050
  • 收稿日期:2016-11-20 出版日期:2017-09-29 发布日期:2017-10-23
  • 作者简介: 王旭阳(1974— ), 女, 甘肃陇西人, 兰州理工大学教授, 硕士生导师, 主要从事数据库理论及应用、 数据挖掘和知识工程研究, (Tel)86-18662683957(E-mail)1023933720@ qq. com。
  • 基金资助:
    国家自然科学基金资助项目(61563030)

Research on Extraction Method of Specific Domain Concept and Property

WANG Xuyang, JIANG Xiqiu   

  1. College of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China
  • Received:2016-11-20 Online:2017-09-29 Published:2017-10-23

摘要: 针对互联网中开放式中文文本关系难以抽取的问题, 提出一个新的关系抽取方法。 为缓解关系三元组抽
取较难的问题, 给出一个新的基于属性和概念实例的关系三元组构造方法, 抽取的大量概念实例关系三元组中
不仅包含大量显式关系三元组, 还包含部分隐式关系三元组。 在此基础上, 针对关系三元组含有噪声和错误的
问题, 使用基于 Adaboost 迭代算法的协同训练方法对关系抽取模型进行优化。 以大学类别领域百科条目真实
文本为实验数据进行实验的结果表明, 与同类关系抽取方法对比, 该方法在召回率和 F 值上能取得较好的抽取
性能。

关键词: Adaboost 迭代算法, 关系抽取, 关系三元组, 协同训练

Abstract: A new relation extraction method is proposed to solve the problem of relation extraction from open
Chinese free texts. In order to alleviate the difficult problem of relation triples extraction, a method based on the
relationship between attribute and concept instance triples is proposed, a large number of instances of concept
and relation triples includes explicit relation triples and contains an implicit relation triples. The relationship
triple construction contains noise and error, in view of the relationship between the ternary group is used contains
noise and wrong question, Adaboost based iterative algorithm of collaborative training methods is used to
strengthen the relationship between extraction model. Experiment is carried out on the text of the encyclopedia
entries in the field of university, and the experimental results show that the method can obtain better
performance.

Key words: collaborative training, relation extraction, relation triples, Adaboost iterative algorithm

中图分类号: 

  •