J4 ›› 2010, Vol. 28 ›› Issue (01): 68-.

Previous Articles     Next Articles

XML Domument Clustering Research Based on Weighted Cosine Similarity

LI Wei1|SUN Tao1,CHEN Jian-xiao2,LUO Zi-heng1|LI Xiong-fei1
  

  1. 1College of Computer Science and Technology,Jilin University, Changchun 130012, China;
    2Department of Mathematics and Information Technology,Hanshan Teachers College,Chaozhou 521041,China
  • Online:2010-01-20 Published:2010-04-06

Abstract:

In practical applications, some structures of an XML(eXtensible Markup Language) document are often changed. In order to mining knowledge hiden in the freduently changing structures in the XML document history changes, a method to found the frequently changing structures is proposed, then uses a document-vector model which composition by a set of frequently changing structures to represent an XML document, to proportion that frequently changing structures appearance in the cluster as weight, and cluster XML documents use weighted cosine similarity. After experimental analysis, according to frequently changing structures which found in the XML document historical change process will be better able to cluster XML documents. Cluster XML document using the weighted cosine similarity, the precision rate, recall rate and cluster internal distance of cluster result are all better than the results obtained by use non-weighted cosine similarity.

Key words: XML document clustering, weighted cosine similarity, frequently changing structures

CLC Number: 

  • TP391.1