面向文本的本体学习方法

引用本文

王俊华, 左万利, 彭涛. 面向文本的本体学习方法. 45(1): 236-244
WANG Jun-hua, ZUO Wan-li, PENG Tao. Test-oriented ontology learning methods. Journal of Jilin University Engineering and Technology Edition, 45(1): 236-244 复制到剪切板

Permissions

面向文本的本体学习方法

王俊华^1,^2,³, 左万利^1,², 彭涛^1,²

1.吉林大学计算机科学与技术学院,长春 130012

2.吉林大学符号计算与知识工程教育部重点实验室,长春 130012

3.长春工业大学计算机科学与工程学院,长春 130012

左万利(1957-),男,教授,博士.研究方向:本体工程,Web数据挖掘.E-mail:wanli@jlu.edu.cn

作者简介:王俊华(1982-),女,博士研究生.研究方向:本体工程和自然语言处理.E-mail:wangjh10@mails.jlu.edu.cn

基金:国家自然科学基金项目(60973040); 国家自然科学青年基金项目(60903098,61300148); 吉林省重点科技攻关项目(20130206051GX); 吉林省科技计划青年科研基金项目(20130522112JH)

摘要

借助文本预处理工具Gate和通用本体WordNet,采用统计、频繁项挖掘、模式匹配、启发式学习和主动学习等技术,学习本体基元——概念(含实例)、概念间的分类关系、概念间的语义关系和概念属性,其中概念属性学习为本文首次提出。实验结果表明,本文方法改善了概念语义排歧效果,丰富了短语概念学习与语义关系学习,提高了本体自动构建的准确度,降低了本体学习的代价。

关键词: 人工智能; 本体学习; 主动学习; 模式匹配; 频繁项挖掘; 启发式学习

中图分类号:TP18,TP391.1 文献标志码:A 文章编号:1671-5497(2015)01-0236-09

Test-oriented ontology learning methods

WANG Jun-hua^1,^2,³, ZUO Wan-li^1,², PENG Tao^1,²

1.College of Computer Science and Technology, Jilin University, Changchun 130012, China

2.Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education,Changchun 130012,China

3.College of Computer Science and Engineering,Changchun University of Technology,Changchun 130012,China

Abstract

The techniques of statistics, frequent item mining, pattern matching, heuristic learning and active learning are employed to learn the concepts (including instances), taxonomic relations, semantic relations and the concept properties from the documents based on preprocessing tool Gate and general ontology WordNet. The concept property learning was first proposed in this paper. Experiment results show that the proposed ontology learning method can improve the effect of word semantic disambiguation, enrich phrase concept learning and semantic relationship learning, increase the accuracy of automatic ontology construction and reduce the cost of ontology learning.

Keyword: artificial intelligence; ontology learning; active learning; pattern matching; frequent item mining; heuristic learning

Show Figures

引言

本体是共享概念模型的明确的形式化规范说明。本体上的二元关系, 包括分类关系(IS-A)、语义关系(SR)和属性关系(AR)。20世纪70年代本体被引入人工智能领域, 随着人工智能的发展, 本体应用越来越广泛^{[1, 2]}, 而手工构建本体的代价太大, 因此人们提出了自动或半自动地构建本体。面向文本的本体学习研究的早期成果有Text2Onto^[3]、Hasti^[4]、OntoLearn^[5]、OntoBuilder^[6]和OntoGen^[7], 但多数是原型系统, 生成的本体不能指导各类应用。面向文本的本体学习仍是本体工程的研究热点。

邢军等^[8]以面向对象思想的分析方法为基础, 把传统的单层文本向量空间模型改进为2层向量空间模型, 并在此基础上引入模糊形式概念分析本体学习技术。Zouaq等^[9]提出了OntoCmaps, 是一领域独立的和开放的本体学习工具, 它从语料库中抽取深层的语义表达。OntoCmaps以概念图的形式生成丰富的概念表示, 并提出一种基于度量的创新的过滤机制。Ruiz-Martinez等^[10]提出了一种面向文本构建生物医学本体的方法。该方法通过自然语言处理和增量知识获取技术来获得相关概念和关系, 这些被包括在OWL本体中。此外, 他们用UMLS来连接本体中孤立的概念区域。Yang等^[11]提出了一新的本体学习模型, 该模型提高了抽取概念的效率, 减少了构建本体的时间。该模型包括几个方面, 其中区域概念抽取是最主要的方面, 它把概念抽取与个性化推荐联合起来实现了一个更精确和稳定的领域概念抽取方法。Jiang等^[12]提出了CRCTOL系统, 用于从指定的领域文档中自动地挖掘本体。CRCTOL采用一种完全的文本剖析技术及统计与lexicon-syntactic方法相结合的综合策略, 包括一个统计算法用于从文档集中抽取关键概念, 一个语义消歧算法用于消除关键概念中的单词的歧义, 一个基于规则的算法用于抽取关键概念之间的关系, 一个改进的广义关联规则挖掘算法用于修剪对本体学习不重要的关系。以上成果较早期成果有所改进, 但仍不能满足各类本体的应用需求。

本文借助文本预处理工具Gate和通用本体WordNet, 采用统计、频繁项挖掘、模式匹配、启发式学习和主动学习等技术, 学习本体基元— — 概念、概念间的分类关系、概念间的语义关系和概念属性。提出了基于主动学习的语义排歧算法, 弥补了SSI算法^[13]无法处理所有术语在WordNet中都不是独义的情况的不足; 并增加了概念属性的学习。

1 概念学习

概念可以是词, 也可以是短语, 是在特定领域表示想法、观念、范畴或类的实体集合, 是特定领域具有语义的词汇的集合。概念由术语经语义排歧后获得, 术语是领域知识的外在表现。概念学习包括术语抽取和词义消歧。

1.1 术语抽取

术语是代表领域知识的短语或单词, 其语言结构较固定, 一般有前后界标记、长度较短、为名词性单词或符合一定模式的短语, 停用词除外。术语亦具有较明显的统计特性, 一般为高频词。综合术语的如上特征, 本文结合语言学采用统计和模式匹配技术学习术语, 术语抽取如图1所示。

	Figure Option View Download New Window
	图1 术语抽取Fig.1 Term extraction

术语抽取步骤如下:

Step1 调用Gate接口, 预处理语料集, 并将词性标注结果输出到XML文档中。

Step2 借用dom4j处理XML文档, 依次抽取单词和其词性, 并逐行存入文本中。

Step3 抽取名词并计算其频数的名词为术语。

Step4 获取名词性短语并计算其频数的名词性短语为术语。

定义1 名词性短语。字符串的互信息大于0.5。

定义2 左上下文依赖。计算公式为:

定义3 右上下文依赖。计算公式为:

定义4 互信息。用于度量字符串间的关联程度。已知计算公式为:

字符串计算公式为:

1.2 词义消歧

术语可能具有多个语义, 而概念的语义是唯一的, 因此需要确定术语的语义。相同语义的术语集合可以唯一标识特定的概念语义。词义消歧可确定术语的语义, 获得相同语义的术语集合。本文利用主动学习技术改进SSI算法实现术语的语义消歧以获取概念, 消歧过程由学习引擎和选择引擎两部分组成, 如图2所示。选择引擎自动选择信息增益大的未消歧术语, 提交领域专家标注, 系统将反馈结果添加到概念集中, 以最大限度地提升学习引擎性能。

	Figure Option View Download New Window
	图2 基于主动学习的词义消歧Fig.2 Word semantic disambiguation based on active learning

词义消歧步骤如下:

Step1 提取术语的语境特征, 生成术语语境。

定义5 语境。术语的语境表征了术语应用的上下文, 记为的权重。

语境特征提取有滑动窗口法^{[14, 15]}、基于词间依赖关系的语境特征提取法^{[16, 17, 18]}和基于句法分析的语境特征提取法^{[19, 20, 21]}三类。滑动窗口法易实现, 但由于没有考虑句法和语义关系致使结果中包含了邻近的但不相关的词而遗漏了较远的但相关的词。基于词间依赖关系的语境特征提取法的准确率较高, 但提取的语境词数目较少。基于句法分析的语境特征提取法在一定程度上弥补了以上不足。综合语境特征提取的研究现状, 本文选用了基于句法分析的语境特征提取法。

采用基于句法分析的语境特征提取法提取的术语语境特性, 为在句法分析树结构中与术语拥有共同祖先或位置上相邻的节点。提取术语语境特征的步骤如下:

①生成句法结构分析树。图3是语句“ The coaches which brought the workers to the plant are produced by FAW corporation and CN heavy duty truck factory.” 的句法分析树。

	Figure Option View Download New Window
	图3 句法分析树示例Fig.3 Example of syntactic analysis tree

②按照树的层次结构, 从目标节点开始遍历树结构, 逐层搜集邻近节点选为候选语境特征词, 并利用候选特征节点和歧义词间的层次关系和路径距离, 依据式(6)计算其权重, 直到根节点为止。

式中: 是调节因子; 分别选择0.4和0.2。

③将候选语境特征词按其权重降序排列, 去掉虚词, 选取前8个作为歧义词的语境特征, 获得术语语境。

Step2 计算术语语义相似度, 生成术语语义相似矩阵

式中: 的语义相似度。

定义6 术语语义相似度。已知词性为计算公式为:

式中:

定义7 词语义相似度。已知词性为计算公式为:

式中: 的特定语义所对应的概念。

定义8 概念语义相似度。已知概念计算公式为^[22]:

式中: 的最短上下位路径长度。

Step3 处理在WordNet中只存在一个语义的术语, 添加相应概念到集合中。

Step4 处理与存在关联的歧义术语, 针对每个术语依据式(10)添加与关联度最大的概念到集合中。

定义9 概念关联度。已知概念的值为1, 否则为0。

Step5 在未消歧术语集中选择信息增益最大的术语提交领域专家。保存反馈结果到转Step4。

定义10 信息增益。本文中的选择引擎需在未消歧术语集中选出核心术语, 即与其他未消歧术语联系最紧密的术语, 因此本文以术语的相似度和量化术语对消歧任务的信息增益。

式中: 获得。

2 关系学习

2.1 分类关系学习

分类关系是一种类属关系, 大多存在于表示事物名称的名词之间, 具有一定的语言模式。直接在WordNet中查找概念间分类关系, 准确率高但查全率不够。如图4所示, 本文通过挖掘CC二元频繁项来提高查全率, 并采用模式匹配的方法自动判别CC二元频繁项集合中的分类关系, 弥补了关联规则学习关系时关系类型需要工作人员确定的缺陷。

	Figure Option View Download New Window
	图4 分类关系学习Fig.4 IS-A relation learning

分类关系学习的步骤如下:

Step1 基于WordNet调用方法直接获取概念间分类关系。

Step2 以概念集为项目集, 以语句为事务, 基于阈值抽取CC二元频繁项。

定义11 CC二元频繁项(CC)。表达概念间的二元关系, 以二元组表示CC=(C, C)。

Step3 基于WordNet过滤CC二元频繁项, 删除可识别关系的CC二元频繁项。

Step4 采用模式匹配技术识别CC二元频繁项中的分类关系, 使用如下分类关系模式。

分类关系模式:

NP such as NP, NP… and NP

Such NP as NP, NP… or NP

NP, NP… and other NP

NP, especially NP, NP… and NP

NP is a NP

2.2 语义关系学习

语义关系描述了一种对象属性, 即两个概念间的关系可用一个动词表示。现有VCC(n)事务方法学习概念间语义关系基于假设:如果概念都出现在含有动词V的个词内, 动词和概念对间的这种关联度则可以用一个条件概率来表示^[8]。如图5所示, 本文借鉴VCC(n)事务方法的思想, 通过挖掘CCV频繁项启发式学习概念语义关系。

	Figure Option View Download New Window
	图5 语义关系学习Fig.5 Semantic relation learning

语义关系学习的步骤如下:

Step1 由语料库中提取动词集Vset。

Step2 读取分类关系学习阶段未标识关系的CC二元频繁项。

Step3 以语句为事务, 计算

Step4 设定阈值

Step5 基于启发式规则1获语义关系集SR。

启发式规则1:若存在频繁项

2.3 概念属性学习

概念属性学习是获取概念内涵的方法之一。从语言学角度, 概念的属性仍为概念, 属性值则是属性的实例, 因此概念属性包含了概念与概念、概念与实例的关联。概念属性包含信息的多样性, 增加了概念属性学习的难度。本文提出了模式匹配、CCC三元频繁项挖掘、启发式规则与WordNet相结合的概念属性学习方法, 见图6。

	Figure Option View Download New Window
	图6 概念属性学习Fig.6 Concepts attribute learning

概念属性学习的步骤如下:

Step1 直接由WordNet中查找概念的Part属性。

Step2 采用模式匹配技术识别剩余CC二元频繁项中的概念属性关系, 使用的模式为如下概念属性模式。

NP’ s NP

NP part of NP

Step3 以概念集为项目集, 语句为事务, 基于阈值挖掘CCC三元频繁项

Step4 遍历CCC三元频繁项集, 结合WordNet基于启发式规则2学习概念属性。

启发式规则2:若CCC三元频繁项中有两个概念存在上下位关系, 且剩余概念和上述概念不存在上下位关系和同义关系, 则CCC三元频繁项中存在概念、属性和属性值, 其中属性是属性值的上位概念。

例1 如若到概念属性集中。

2.4 算法描述

算法1 元频繁项挖掘算法

输入:K-1元频繁项集L_K-1,

事务集D, 阈值V。

输出:K元频繁项集L_K。

1 C_K-1=L_K-1;

2 组成候选集:

3 令C_K为容量为K的那些集合的集合,

4 它们的所有非空真子集都属于C_K-1;

5 While C_K不为空do

6 扫描事务集D:

7 计算C_K中每个元素E_i的次数T_i;

8 令L_K为C_K中频繁集的汇集;

9 IF T_i> V Then

10 E_i加入L_K;

11 End IF

12 End

算法2 模式匹配算法

输入:概念C₁、C₂, 模式P。

输出:True或者False。

1 生成模式:Pattern。compile(C₁, C₂, P_i);

2 扫描语料集:

3 以每篇文档为母字符串生成匹配器;

4 执行模式匹配, matcher。find();

5 IF 匹配成功 Then

6 Return True;

7 Else

8 Return False;

9 End IF

10 End

3 实验结果与分析

本文提出的面向文本的本体学习方法简记为TOL。并选择旅游领域的英文语料作为测试语料(http://www.lonelyplanet.com/destinations)验证TOL。评价指标采用查准率、查全率和指数。从以下5个方面给出实验结果。语料库不同, 最佳阈值亦有所差别, 因此通过实验获得各阈值。将实验结果与较好的本体自动构建平台Text2Onto的结果进行了对比分析。

3.1 术语抽取

实验1 主要考查TOL方法中阈值的不同在术语抽取过程中对3个评价指标的影响情况, 从而选择最佳阈值。的取值范围为0.003~0.009, 的取值范围为0.002~0.008。不同阈值对评价指标的影响如图7~图9所示。对于准确度来说阈值选择过小会产生噪声数据, 而阈值过大会过滤掉有用信息, 可以看出当为0.006, 为0.004时, 整体效果较好。把为0.006, 为0.004的结果与Text2Onto的结果相比较, 如表1所示, 各项指标均优于Text2Onto, 这与本文增加了短语学习密不可分。后续实验均在为0.006, 为0.004的基础上进行。

	Figure Option View Download New Window
	图7 α -β -准确度Fig.7 α -β -precision

	Figure Option View Download New Window
	图8 α -β -召回率Fig.8 α -β -recall

	Figure Option View Download New Window
	图9 α -β -F-measureFig.9 α -β -F-measure

表1 术语抽取实验结果对比 Table 1 Comparison of term extraction experiment

3.2 词义消歧

实验2 主要考查TOL方法中基于主动学习的词义消歧算法的性能。实验数据采用Senseval-3中的全文消歧任务作为测试集, 该测试集由3篇文档组成, 共包含349个句子、4903个单词, 需要消歧的单词有1969个, 短语有114个。Text2Onto直接将术语作为概念未提供词义消歧功能, 因此仅将实验结果与SSI算法的结果做了对比分析, 对比结果见表2。TOL各项指标均显著高于SSI, 可见专家的适当干预是提高词义消歧的有效途径。

表2 词义消歧实验结果对比 Table 2 Comparison of WSD experiment

3.3 分类关系学习

实验3 主要考查TOL方法中阈值θ 的变化在分类关系学习过程中对3个评价指标的影响情况, 从而选择最佳阈值。θ 的取值范围为3~9, 不同阈值对评价指标的影响如图10所示, 可以看出当θ 为6时, 整体效果较好。另外把θ 为6的结果与Text2Onto的结果相比较, 对比结果见表3。TOL各项指标均高于Text2Onto, 其中准确率高出Text2Onto 6个百分点。这要归功于本文采用了多种策略相结合的分类关系学习方法。

	Figure Option View Download New Window
	图10 θ -准确度-召回率-F-measureFig.10 θ -precision-recall-F-measure

表3 分类关系学习实验结果对比 Table 3 Comparison of IS-A relation learning

3.4 语义关系学习

实验4 主要考查TOL方法中阈值γ 的变化在语义关系学习过程中对3个评价指标的影响情况, 从而选择最佳阈值。γ 的取值范围为3~9, 不同阈值对评价指标的影响如图11所示, 可以看出当γ 为5时, 整体效果较好。另外把γ 为5的结果与Text2Onto的结果相比较, 对比结果见表4。TOL的准确率、召回率和F-measure值均稍高于Text2Onto。

	Figure Option View Download New Window
	图11 γ -准确度-召回率-F-measureFig.11 γ -precision-recall-F-measure

表4 语义关系学习实验结果对比 Table 4 Comparison of semantic relation learning

3.5 概念属性学习

实验5 主要考查TOL方法中阈值为4时, 整体效果较好。另外, Text2Onto尚未提供概念属性的学习。

	Figure Option View Download New Window
	图12 δ -准确度-召回率-F-measureFig.12 δ -precision-recall-F-measure

4 结束语

本文提出的面向文本的本体学习方法, 使用统计和名词性短语模式学习术语, 并利用主动学习技术改进SSI算法实现术语的语义排歧以获取概念, 采用频繁项挖掘与模式匹配技术结合WordNet学习概念上下位关系, 通过挖掘CCV频繁项启发式学习概念语义关系, 结合模式匹配、CCC三元频繁项挖掘、启发式规则与WordNet学习概念属性, 实验结果表明TOL方法整体效果较好, 改善了概念语义排歧效果, 丰富了短语概念学习与语义关系学习, 提高了本体自动构建的准确度, 可降低本体构建的代价。其中, 提出的基于主动学习的语义排歧算法, 弥补了SSI算法无法处理所有术语在WordNet中都不是独义的情况的不足; 并增加了概念属性的学习。

The authors have declared that no competing interests exist.

参考文献

View Option

[1]	叶育鑫, 欧阳丹彤, 领吉, 等. 本体与规则整合的推理方法研究及设计[J]. 吉林大学学报: 工学版, 2009, 39(5): 1297-1302. Ye Yu-xin, Ouyang Dan-tong, Ling Ji, et al. Research and design of reasoning algorithm with ontologies and rules[J]. Journal of Jilin University (Engineering and Technology Edition), 2009, 39(5): 1297-1302. [本文引用:1] [CJCR: 0.701]
[2]	白岩, 刘大有, 刘杰. 一种移动Agent通信中本体信息调整方法[J]. 吉林大学学报: 工学版, 2007, 37(3): 587-590. Bai Yan, Liu Da-you, Liu Jie. Ontology based information alignment method in mobile Agent communication[J]. Journal of Jilin University (Engineering and Technology Edition), 2007, 37(3): 587-590. [本文引用:1] [CJCR: 0.701]
[3]	Philipp Cimiano, Johanna Väolker. Text2Onto a framework for ontology learning and data-driven change discovery[C]∥LNCS, 2005, 3513: 227-238. [本文引用:1]
[4]	Shamsfard M, Barforoush A A. Learning ontologies from natural language texts[J]. Journal of Human-Computer Studies, 2004, 60(1): 17-63. [本文引用:1] [JCR: 1.415]
[5]	Navigli R, Velardi P, Gangemi A. Ontology learning and its application to automated terminology translation[J]. IEEE Intelligent Systems, 2003, 18(1): 22-31. [本文引用:1] [JCR: 2.154]
[6]	Avigdor G, Giovanni M, Hasan J. OntoBuilder: fully automatic extraction and consolidation of ontologies from web sources[C]∥Proc of the ICDE, Boston: IEEE Computer Society, 2004: 853-858. [本文引用:1]
[7]	Fortuna Blaz, Grobelnik Marko, Mladenic Dunja. OntoGen: semi-automatic ontology editor[C]∥HCII, 2007: 309-318. [本文引用:1]
[8]	邢军, 韩敏. 基于两层向量空间模型和模糊FCA本体学习方法[J]. 计算机研究与发展, 2009, 46(3): 443-451. Xing Jun, Han Min. An ontology learning method based on double VSM and fuzzy FCA[J]. Journal of Computer Research and Development, 2009, 46(3): 443-451. [本文引用:2]
[9]	Zouaq Amal, Gasevic Dragan, Hatala Marek. Towards open ontology learning and filtering[J]. Information Systems, 2011, 36(7): 1064-1081. [本文引用:1] [JCR: 1.768]
[10]	Ruiz-Martinez J M, Valencia-Garcia R, Fernand ez-Breis J T, et al. Ontology learning from biomedical natural language documents using UMLS[J]. Expert Systems with Applications, 2011, 38(10): 12365-12378. [本文引用:1] [JCR: 1.854]
[11]	Yang Qing, Cai Kai-min, Sun Jun-li, et al. Design analysis and implementation for ontology learning model[C]∥ICCET, 2010: 164-167. [本文引用:1]
[12]	Jiang Xing, Tan Ah-hwee. CRCTOL: a semantic-based domain ontology learning system[C]∥ICCET, 2010: 3164-3167. [本文引用:1]
[13]	Navigli R, Velardi P. Structural semantic interconnections: acknowledge-based approach to word sense disambiguation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(7): 1075-1086. [本文引用:1] [JCR: 4.795]
[14]	Patwardhan S, Banerjee S, Pedersen T. UMND1: unsupervised word sense disambiguation using contextual semantic relatedness[C]∥The 4th International Workshop on Semantic Evaluations, 2007: 390-393. [本文引用:1]
[15]	Pedersen T, Kolhatkar V. WordNet: SenseRelate: AllWords -a broad coverage word sense tagger that maximizes semantic relatedness[C]∥NAACL HLT, 2009: 17-20. [本文引用:1]
[16]	McCarthy Diana, Koeling Rob, Weeds Julie, et al. Unsupervised acquisition of predominant word senses[J]. Computational Linguistics, 2007, 33(4): 553-590. [本文引用:1] [JCR: 0.94]
[17]	Eneko Agirre, Oier Lopez de Lacalle, Aitor Soroa. Knowledge-based WSD on specific domains: performing better than generic supervised WSD[C]∥The Twenty-First International Joint Conference on Artificial Intelligence, 2009: 1501-1506. [本文引用:1] [JCR: 2.194]
[18]	Lu Zhi-mao, Liu Ting, Zhang Gang, et al. Word sense disambiguation based on dependency relationship analysis and Bayes model[J]. High Technology Letters, 2003, 13(5): 1-7. [本文引用:1] [CJCR: 0.08]
[19]	Chen P, Ding W, Bowes C, et al. A fully unsupervised word sense disambiguation method using dependency knowledge human language technologies[C]∥The Annual Conference of the North American Chapter of the ACL, 2009: 28-36. [本文引用:1]
[20]	Lu Wen-peng, Huang He-yan, Zhu Chao-yong. Feature words selection for knowledge-based word sense disambiguation with syntactic parsing[J]. Przeglad Elektrotechniczny, 2012, 88: 82-87. [本文引用:1] [JCR: 0.244]
[21]	Huang He-yan, Lu Wen-peng. Knowledge-based word sense disambiguation with feature words based on dependency relation and syntax tree[J]. IJACT, 2011, 3(8): 73-81. [本文引用:1]
[22]	Leacock Claudia, Chodorow Martin. Combining local context and WordNet similarity for word sense identification[C]∥Fellbaum, 1998: 265-283. [本文引用:1]

2009

0.0

0.701

. 2009, 39(5):1297-1302

Research and design of reasoning algorithm with ontologies and rules

1.College of Computer Science and Technology, Jilin University, Changchun 130012, China； 2.Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Changchun 130012, China； 3.Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland

The motivation to integrate ontology layer with rule layer is to improve the features of knowledge presentation formalism, named the expressive power and the reasoning procedure of semantic web. Also the integration is not just simply unifying conclusions derived from each layer. In addition, when the expressive power is increased, problems will arise from reasoning. In this paper, we introduce an integrated knowledge representation formalism and an algorithm for reasoning on this formalism. A varietal approach, named ModelTableau, is proposed, which is based on tableau calculus. This approach can be used to deal with expressive representation and complex reasoning under combing ontologies and rules. Furthermore, the reasoning algorithm is given that combines the ModelTableau approach and the Topdown procedure based on SLDResolution. The soundness of this algorithm is proved and a prototype system is provided. The results show that both the expressive power and reasoning procedure are improved.

在给出混合知识表示的基础上，提出基于Tableau演算的变种算法ModelTableau，将其与以SLD原理为基础的Topdown算法混合，给出知识查询的推理方法设计。最后，给出了算法相关证明和系统原型。结果表明：该方案丰富了语义Web的知识表示能力，有效解决了由本体层和规则层结合所产生的查询推理问题。

... 20世纪70年代本体被引入人工智能领域,随着人工智能的发展,本体应用越来越广泛^[1,2],而手工构建本体的代价太大,因此人们提出了自动或半自动地构建本体 ...

2007

0.0

0.701

. 2007, 37(3):587-590

Ontology based information alignment method in mobile Agent communication

College of Computer Science and Technology，Jilin University，Changchun 130012，China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University，Changchun 130012，China

In order to manage the open ontologies effectively, coordinate the ontologies of different fields, even with the semantic difference of the ontologies of the same field, providing a based Multi-Agent Ontology Management Architecture(MOMA), it has the following characteristic: ①The concept included in Agent communication entity adopts ontology method to describe, divide the social management of one loose layer to ontologies. ②Setting up an open ontologies environment of stepping the platform dynamically, enable Agent to obtain, share and manage various kinds of ontology resources, and it can offer Agent service according to the ontology service needed too. It also can support other questions, such as the negotiation, coordination and communication of different Agents.

为了对开放本体进行有效管理，协调不同领域的本体，甚至是同领域的本体的语义差异，给出了一个基于多Agent的本体管理体系MOMA(Multibased Ontology Management Architecture)，MOMA具有以下特点:①Agent通讯实体中包含的概念采用本体方法进行描述，对本体做分层松散的社会性管理。②建立一个动态跨平台的开放本体环境，使得Agent能够获取、共享和管理各种本体资源，也可以对Agent提供按需的本体服务，支持Agent间的协同工作、Agent间的协商、Agent的相互通讯等问题。

2005

0.0

... 面向文本的本体学习研究的早期成果有Text2Onto^[3]、Hasti^[4]、OntoLearn^[5]、OntoBuilder^[6]和OntoGen^[7],但多数是原型系统,生成的本体不能指导各类应用 ...

2004

1.415

0.0

. 2004, 60(1):17-63 DOI:10.1016/j.ijhcs.2003.08.001

Learning ontologies from natural language texts

Abstract Research on ontology is becoming increasingly widespread in the computer science community. The major problems in building ontologies are the bottleneck of knowledge acquisition and time-consuming construction of various ontologies for various domains/applications. Meanwhile moving toward automation of ontology construction is a solution. We proposed an automatic ontology building approach. In this approach, the system starts from a small ontology kernel and constructs the ontology through text understanding automatically. The kernel contains the primitive concepts, relations and operators to build an ontology. The features of our proposed model are being domain/application independent, building ontologies upon a small primary kernel, learning words, concepts, taxonomic and non-taxonomic relations and axioms and applying a symbolic, hybrid ontology learning approach consisting of logical, linguistic based, template driven and semantic analysis methods. Hasti is an ongoing project to implement and test the automatic ontology building approach. It extracts lexical and ontological knowledge from Persian (Farsi) texts. In this paper, at first, we will describe some ontology engineering problems, which motivated our approach. In the next sections, after a brief description of Hasti, its features and its architecture, we will discuss its components in detail. In each part, the learning algorithms will be described. Then some experimental results will be discussed and at last, we will have an overview of related works and will introduce a general framework to compare ontology learning systems and will compare Hasti with related works according to the framework.

... 面向文本的本体学习研究的早期成果有Text2Onto^[3]、Hasti^[4]、OntoLearn^[5]、OntoBuilder^[6]和OntoGen^[7],但多数是原型系统,生成的本体不能指导各类应用 ...

2003

2.154

0.0

... 面向文本的本体学习研究的早期成果有Text2Onto^[3]、Hasti^[4]、OntoLearn^[5]、OntoBuilder^[6]和OntoGen^[7],但多数是原型系统,生成的本体不能指导各类应用 ...

2004

0.0

... 面向文本的本体学习研究的早期成果有Text2Onto^[3]、Hasti^[4]、OntoLearn^[5]、OntoBuilder^[6]和OntoGen^[7],但多数是原型系统,生成的本体不能指导各类应用 ...

2007

0.0

... 面向文本的本体学习研究的早期成果有Text2Onto^[3]、Hasti^[4]、OntoLearn^[5]、OntoBuilder^[6]和OntoGen^[7],但多数是原型系统,生成的本体不能指导各类应用 ...

2009

0.0

. 2009, 46(3):443-451

An ontology learning method based on double VSM and fuzzy FCA

1(School of Electronic and Information Engineering, Dalian University of Technology, Dalian 116024)2(School of Information Science and Engineering, Dalian Polytechnic University, Dalian 116034)

Ontology realization poses as the major hindrance to the evolution of World Wide Web into semantic Web. The manual construction of ontology demands large amount of labor as well as lasts for long durations. Ontology learning technology makes it possible for the automatic building of ontology in texts and greatly accelerates the speed of construction; yet, the technology is constrained by its lack of generality and accuracy. Based on the object-oriented analytical methods, a formal spatial description on ontology learning data source is performed. The authors revise the classical descriptive method for texts, that is, the object-oriented single vector space model (VSM) and propose a double vector space model (D-VSM), specifically, verbal layer and noun layer. This model is characterized by the inclusion of diverse attributes and solid relations. To further cope with information redundancy associated with FCA method and to improve the accuracy of concept abstraction, the fuzzy formal concept analysis (FFCA) ontology learning technology is introduced, which can fully explore the distributed property of data in DVSM and specialize in solving issues like ontology continuity, ontology relations obtainment, etc. An ontology learning tool is created based on DV-FFCA methodology, which provides powerful support for automatic (semi-automatic) ontology construction.

本体是WWW进化为语义Web版本的瓶颈,手工构造本体费时费力,本体学习技术使得在文本中自动构造本体成为可能,但存在通用性差和准确性低等问题.提出以面向对象思想的分析方法为基础,把传统的单层文本向量空间模型(VSM)改进为2层向量空间模型(double vector space model,D-VSM),该模型不仅具有属性特性,而且还具有很强的关系特性.在此模型的基础上,引入模糊形式概念分析(fuzzy formal concept analysis,FFCA)本体学习技术.该技术充分考虑D-VSM模型中的数据分布特点,较好地解决本体学习通用性、本体关系获取等问题.基于上述方法实现一个本体学习工具,为本体的(半)自动构造提供有力的支持.

... 邢军等^[8]以面向对象思想的分析方法为基础,把传统的单层文本向量空间模型改进为2层向量空间模型,并在此基础上引入模糊形式概念分析本体学习技术 ...

... 现有VCC(n)事务方法学习概念间语义关系基于假设:如果概念都出现在含有动词V的个词内,动词和概念对间的这种关联度则可以用一个条件概率来表示^[8] ...

2011

1.768

0.0

. 2011, 36(7):1064-1081 DOI:10.1016/j.is.2011.03.005

Towards open ontology learning and filtering

Abstract Open ontology learning is the process of extracting a domain ontology from a knowledge source in an unsupervised way. Due to its unsupervised nature, it requires filtering mechanisms to rate the importance and correctness of the extracted knowledge. This paper presents OntoCmaps, a domain-independent and open ontology learning tool that extracts deep semantic representations from corpora. OntoCmaps generates rich conceptual representations in the form of concept maps and proposes an innovative filtering mechanism based on metrics from graph theory. Our results show that using metrics such as Betweenness, PageRank, Hits and Degree centrality outperforms the results of standard text-based metrics (TF-IDF, term frequency) for concept identification. We propose voting schemes based on these metrics that provide a good performance in relationship identification, which again provides better results (in terms of precision and F -measure) than other traditional metrics such as frequency of co-occurrences. The approach is evaluated against a gold standard and is compared to the ontology learning tool Text2Onto. The OntoCmaps generated ontology is more expressive than Text2Onto ontology especially in conceptual relationships and leads to better results in terms of precision, recall and F -measure. Highlights ? We present OntoCmaps, an open ontology learning tool that uses deep NLP methods. ? Our approach is based on patterns and is domain independent. ? We also propose metrics from graph theory to filter important concepts and relationships. ? OntoCmaps results are better than those of another standard ontology learning tool Text2Onto. ? Metrics from graph theory outperform standard text-based metrics usually used such as TF-IDF.

... Zouaq等^[9]提出了OntoCmaps,是一领域独立的和开放的本体学习工具,它从语料库中抽取深层的语义表达 ...

2011

1.854

0.0

. 2011, 38(10):12365-12378 DOI:10.1016/j.eswa.2011.04.016

Ontology learning from biomedical natural language documents using UMLS

Abstract The generation of new knowledge is continuous in biomedical domains, thus biomedical literature is becoming harder to understand. Ontologies provide vocabulary standardization, so they can be helpful to facilitate the understanding of biomedical texts. In this work, a methodology for building biomedical ontologies from texts is presented. This approach relies on natural language processing and incremental knowledge acquisition techniques to obtain the relevant concepts and relations to be included in an OWL ontology. Additionally, we provide an algorithm to connect the isolated concepts regions in the ontology using UMLS. We also discuss in this paper the experiment carried out to validate our approach and its positive results in terms of performance and scalability. Research highlights ? Ontologies provide vocabulary standardization. ? In this work, a methodology for building biomedical ontologies from texts is presented. ? This approach relies on natural language processing and incremental knowledge acquisition. ? We provide an algorithm to connect the isolated concepts regions in the ontology using UMLS.

... Ruiz-Martinez等^[10]提出了一种面向文本构建生物医学本体的方法 ...

2010

0.0

... Yang等^[11]提出了一新的本体学习模型,该模型提高了抽取概念的效率,减少了构建本体的时间 ...

2010

0.0

... Jiang等^[12]提出了CRCTOL系统,用于从指定的领域文档中自动地挖掘本体 ...

2005

4.795

0.0

... 提出了基于主动学习的语义排歧算法,弥补了SSI算法^[13]无法处理所有术语在WordNet中都不是独义的情况的不足 ...

2007

0.0