吉林大学学报(工学版) ›› 2014, Vol. 44 ›› Issue (01): 124-128.doi: 10.13229/j.cnki.jdxbgxb201401022

• paper • Previous Articles     Next Articles

Clustering XML documents by layer information

LIU Zhao-jun1,2, ZHAO Hao-yu3, WANG Jing1,2, LI Xiong-fei1,2, LI Wei1,2   

  1. 1. Key Laboratory of Symbol Computation and Knowledge Engineering, Ministry of Education, Changchun 130012, China;
    2. College of Computer Science and Technology, Jilin University, Changchun 130012, China;
    3. College of Software, Jilin University, Changchun 130012, China
  • Received:2012-11-23 Online:2014-01-01 Published:2014-01-01

Abstract:

A layer-sensitive XML document collection clustering method CXLI is proposed in this paper. First, a concept of structural table is put forward to clear up the duplication structures in XML documents. Second, the constraints of editing operations are established. Third, a testing method of the similarity between XML documents is presented. Finally, the XML documents are clustered using agglomerative hierarchical clustering method. ACM SIMOD data set and synthetic data set are used to test the proposed method. Results show that the proposed CXLI has better precision under similar time cost.

Key words: artificient intelligence, data mining, XML, similarity detection, clustering, layer

CLC Number: 

  • TP18

[1] Abiteboul S, Buneman P, Suciu D. Data on the Web[M]. San Francisco: Morgan Kaufmann, 2000.

[2] Wilde E, Glushko R J. XML fever[J]. Communications of the ACM, 2008, 51(7): 40-46.

[3] Selkow S M. The tree-to-tree editing problem[J]. Information Processing Letters, 1977, 6(6): 184-186.

[4] Zhang K, Shasha D. Simple fast algorithms for the editing distance between trees and related problems[J]. SIAM Journal on Computing, 1989, 18(6):1245-1262.

[5] Chawathe S. Comparing hierarchical data in external memory[C]//Proc of the 25th International Conference on Very Large Data Bases, San Francisco: Morgan Kaufmann, 1999: 90-101.

[6] Chawathe S, Rajaraman A, Garcia-Molina H, et al. Change detection in hierarchically structured information[C]//ACM SIGMOD International Conference on Management of Data, ACM: Montreal, Canada, 1996:493-504.

[7] Nierman A, Jagadish H. Evaluating structural similarity in XML documents[C]//Proc of the 5th International Workshop on the Web and Databases, Wisconsin: Madison, 2002:61-66.

[8] Dalamagas T, Cheng T, Winkel K J, et al. A methodology for clustering XML documents by structure[J]. Information Systems, 2006, 31(3): 187-228.

[9] Flesca S, Manco G, Masciari E, et al. Fast detection of XML structural similarity[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(2): 160-175.

[10] Tekli J, Chbeir R, Yetongnon K. An overview on XML similarity: background, current trends and future directions[J]. Computer Science Review, 2009, 3(3): 151-173.

[11] Li W, Li X, Te R. Cluster dynamic XML documents based on frequently changing structures[J]. Advances in Information Sciences and Service Sciences, 2012, 4(6):70-76.

[12] Li W, Li X, Zhao Y. XML documents clustering research based on weighted cosine measure[C]//Proc of the 5th International Conference on Frontier of Computer Science and Technology, Washington: IEEE, 2010: 95-100.

[13] Tagarelli A, Greco S. Semantic clustering of XML documents[J]. ACM Transactions on Information Systems, 2010, 28(1):1-56.

[14] Algergawy A, Mesiti M, Nayak R, et al. XML data clustering: an overview[DB/OL]. http://dl.acm.org/citation.cfm?id=1978804.

[15] Wang W, Zhou H, Yuan Q, et al.Mining frequent patterns based on graph theory[J].Journal of Computer Research and Development, 2005, 42(2):230-235.

[1] LIU Zhong-min,WANG Yang,LI Zhan-ming,HU Wen-jin. Image segmentation algorithm based on SLIC and fast nearest neighbor region merging [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1931-1937.
[2] GUI Chun, HUANG Wang-xing. Network clustering method based on improved label propagation algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1600-1605.
[3] ZHANG Man, SHI Shu-ming. Analysis of state transition characteristics for typical vehicle driving cycles [J]. 吉林大学学报(工学版), 2018, 48(4): 1008-1015.
[4] ZANG Guo-shuai, SUN Li-jun. Method based on inertial point for setting depth to rigid layer [J]. 吉林大学学报(工学版), 2018, 48(4): 1037-1044.
[5] DONG Ying, CUI Meng-yao, WU Hao, WANG Yu-hou. Clustering wireless rechargeable sensor networks charging schedule based on energy prediction [J]. 吉林大学学报(工学版), 2018, 48(4): 1265-1273.
[6] LIU Zi-wu, LI Jian-feng. Erosion damage and evaluation of remanufacturing cladding layer for impeller metals FV520B [J]. 吉林大学学报(工学版), 2018, 48(3): 835-844.
[7] DENG Jian-xun, XIONG Zhong-yang, DENG Xin. Improved DNALA algorithm based on spectral clustering matrix [J]. 吉林大学学报(工学版), 2018, 48(3): 903-908.
[8] LIU Xue-juan, YUAN Jia-bin, XU Juan, DUAN Bo-jia. Quantum k-means algorithm [J]. 吉林大学学报(工学版), 2018, 48(2): 539-544.
[9] HOU Xian-yao, CHEN Xue-wu. Use of public transit information market segmentation based onattitudinal factors [J]. 吉林大学学报(工学版), 2018, 48(1): 98-104.
[10] ZHANG Yang-peng, WEI Hai-bin, JIA Jiang-kun, CHEN Zhao. Numerical evaluation on application of roadbed with composite cold resistance layer inseasonal frozen area [J]. 吉林大学学报(工学版), 2018, 48(1): 121-126.
[11] SUN Zong-yuan, FANG Shou-en. Hierarchical clustering algorithm of moving vehicle trajectories in entrances and exits freeway [J]. 吉林大学学报(工学版), 2017, 47(6): 1696-1702.
[12] JIANG Hai-yu, LIU Yu-hai, SUN Hai-lin, XU Ke-bin, BAI Tian-zeng, CHEN Zu-bin. Stratum undulation velocity model optimization algorithm for microseismic monitoring on surface [J]. 吉林大学学报(工学版), 2017, 47(6): 1969-1975.
[13] LI Xian-sheng, MENG Fan-song, ZHENG Xuan-lian, REN Yuan-yuan, YAN Jia-hui. Driver's visual characteristics based on stress response [J]. 吉林大学学报(工学版), 2017, 47(5): 1403-1410.
[14] GU Xiao-yan, LIU Ya-jun, SUN Da-qian, XU Feng, MENG Ling-shan, GAO Shuai. Microstructures and mechanical properties of transient liquid phase diffusion bonded S355 steel/6005A aluminum alloy joint [J]. 吉林大学学报(工学版), 2017, 47(5): 1534-1541.
[15] LI Jia-fei, SUN Xiao-yu. Clustering method for uncertain data based on spectral decomposition [J]. 吉林大学学报(工学版), 2017, 47(5): 1604-1611.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!