J4 ›› 2009, Vol. 27 ›› Issue (06): 611-.

Previous Articles     Next Articles

XML Domument Clustering Research Based on Weighted Cosine Similarity

LI Wei1a,SUN Tao1a,YE Yuan-yuan1b,LI Xiong-fei1a|LI Nan2
  

  1. 1a. College of Computer Science and Technology;1b. College of Software, Jilin University, Changchun 130012, China;2Changchun Railway Nvehicles Company Limited,Changchun 130062,China
  • Online:2009-11-20 Published:2009-12-18

Abstract:

In order to mine knowledge hiden in the structures that does not often changed in the XML(Extensible Markup Language) document changing history, this paper proposes a method to fiund the frozen structures, then uses a documentvector model composition by a group of frozen structures to represent an XML document, and uses the weighted Jaccard coefficient as similarity, then cluster XML documents based on the relative stable frozen structures which found in the XML document historical change process. Through experiments show that XML documents can be effective clustering base on frozen structures, after cluster, XML documents in each cluster have similar not often changed structures.

Key words: extensible markup language(XML) document, document clustering, weighted jaccard coefficient, frozen structures

CLC Number: 

  • TP391.1