吉林大学学报(信息科学版) ›› 2014, Vol. 32 ›› Issue (1): 76-81.

• 论文 • 上一篇    下一篇

基于中文文本的疾病领域本体学习的研究

贺海涛a, 郑山红a, 侯丽鑫a, 王国春b, 王璐b   

  1. 长春工业大学 a. 计算机科学与工程学院; b. 软件职业技术学院, 长春 130012
  • 收稿日期:2013-08-22 出版日期:2014-01-24 发布日期:2014-04-03
  • 作者简介:贺海涛(1987—), 男, 湖南永州人, 长春工业大学硕士研究生, 主要从事本体、 智能系统和语义网研究,(Tel)86-18943158138(E-mail)hht12tjl@126.com;通讯作者: 郑山红(1970—), 女(朝鲜族),长春人,长春工业大学副教授, 博士,硕士生导师, 主要从事智能系统与语义网研究,(Tel)86-13756476636(E-mail)bioszsh2007@aliyun.com。
  • 基金资助:

    吉林省科技厅自然科学基金资助项目(20130101060JC)

Research on Disease Ontology Learning Based Chinese Text

HE Hai-taoa, ZHENG Shan-honga, HOU Li-xina, WANG Guo-chunb,WANG Lub   

  1. a. College of Computer Science and Engineering; b. College of Software Vocational Technology,Changchun University of Technology, Changchun 130012, China
  • Received:2013-08-22 Online:2014-01-24 Published:2014-04-03

摘要:

为提高领域本体概念及概念之间关系提取效率和准确率, 提出基于中文文本的领域本体学习模型。在提取候选概念的过程中, 采用修改后的关联规则频繁项计算方法对合
成词进行处理, 并结合位图存储分词处理后术语间的物理相邻关系, 再通过计算领域相关度和领域一致度对候选概念进行筛选, 最后利用关联规则可信度和层次聚类的方法分别提取概念间的非分类关系和分类关系。实验结果表明, 该模型对领域本体学习具有合理性, 提出的算法与基于互信息的本体学习相比较, 在概念和关系的提取
上具有较高的准确性。

关键词: 本体学习, 非结构化数据, 关联规则, 位图, 层次聚类

Abstract:

To improve the efficiency and accuracy in choosing concepts and relations of domain ontology, we present an unstructured data based ontology learning model. In the process of extracting the candidate concepts for synthetic word processing, we modified calculation method of frequent item of association rules, and combined with a bitmap to store physically adjacent relationship between the terms after word processing. We filter candidate concepts by calculating areas correlation and areas consistent degree. The association rule credibility and hierarchical clustering methods were used to extract nontaxonomic relations between concepts and classification relationships. Experimental results show that this model is rational in the aspect of domain ontology learning and this algorithm is efficient and accurate in the aspect of extracting concepts and relationships.

Key words: ontology learning, unstructured data, association rules, bitmap, hierarchical clustering

中图分类号: 

  • TP39