J4 ›› 2010, Vol. 48 ›› Issue (02): 277-283.

• 计算机科学 • 上一篇    下一篇

一种基于本体的文本聚类方法

朱会峰, 左万利, 赫枫龄, 彭涛, 纪文彦   

  1. 吉林大学 计算机科学与技术学院, 长春 130012|吉林大学 符号计算与知识工程教育部重点实验室, 长春 130012
  • 收稿日期:2009-04-09 出版日期:2010-03-26 发布日期:2010-03-22
  • 通讯作者: 左万利 E-mail:wanly@mail.jlu.edu.cn

A Novel Text Clustering Method Based on Ontology

ZHU Huifeng, ZUO Wanli, HE Fengling, PENG Tao, JI Wenyan   

  1. College of Computer Science and Technology, Jilin University, Changchun 130012, China|Key Laboratory ofSymbol Computation and Knowledge Engineering of |Ministry of Education, Jilin University, |Changchun 130012, China
  • Received:2009-04-09 Online:2010-03-26 Published:2010-03-22
  • Contact: ZUO Wanli E-mail:wanly@mail.jlu.edu.cn

摘要:

基于本体的文本聚类方法, 在文本表示上引入WordNet, 并定义了关键概念集, 使用WordNet中的概念节点及概念间的语义关系减少文本特征向量维数, 提高聚类效果. 聚类过程中, 算法使用文本的关键概念集和概念特征向量计算文本相似度, 利用文本的关键概念集标注聚簇为聚类结果中的各个簇提供解释. 实验结果表明, 该方法有效地减少了文本特征向量的维数, 提高了文本聚类效果以及聚类结果的可解释性.

关键词: 本体; WordNet; 关键概念集; 概念特征向量

Abstract:

The text clustering method based on ontology applies WordNet and key concept set during text reprensentation, and the concept nodes and the semantic relations between the concepts in the ontology WordNet are used to reduce the number of features so as to improve clustering effect. And during text clustering, the algorithm uses the key concept set and the concept feature vector to calculate the similarity and uses key concept set to provide an explanation for every cluster of the result. The experimental results show that the method can effectively reduce the dimension number of the text feature vector and im
prove the text clustering effect compared with other text clustering algorithm and the novel method for text clustering can come up with a good explanation for the clusters.

Key words: ontology, WordNet, key concept set, concept feature vector

中图分类号: 

  • TP391