吉林大学学报(工学版) ›› 2013, Vol. 43 ›› Issue (01): 130-134.

Previous Articles     Next Articles

New clustering method of mixed-attribute data

BAI Tian1,2, JI Jin-chao1, HE Jia-liang1, ZHOU Chun-guang1   

  1. 1. College of Computer Science and Technology, Jilin University, Changchun 130022, China;
    2. Department of Computer Science, Rutgers, the State University of New Jersey, New Jersey, NJ 08901, USA
  • Received:2011-12-28 Online:2013-01-01 Published:2013-01-01

Abstract: A new Global k-Prototype (GKP) algorithm is proposed for clustering mixed numeric and categorical data. First, the algorithm randomly selects a sufficiently large number of initial prototypes to account for the global distribution of the data sets. Then, it progressively eliminates the redundant prototypes using an iterative optimization process with an elimination criterion function. Systematic experiments were carried out with data from widely used datasets in this area. Experimental results and comparative evaluation show the high performance and consistency of the proposed algorithm. Compared with other well-known mixed data clustering algorithms, the proposed algorithm significantly improves the clustering accuracy.

Key words: artificial intelligence, clustering, data mining, K-prototypes algorithm, mixed attribute data

CLC Number: 

  • TP181
[1] Jain A K, Murty M N, Flynn P J. Data clustering: a review[J]. ACM Computing Surveys, 1999, 31(3): 264-323.

[2] 徐森, 卢志茂, 顾国昌. 结合K均值和非负矩阵分解集成文本聚类算法[J]. 吉林大学学报:工学版, 2011,41(4): 1077-1082. Xu Sen, Lu Zhi-mao, Gu Guo-chang. Integrating K-means and non-negative matrix factorization to ensemble document clustering[J]. Journal of Jilin University(Engineering and Technology Edition), 2011,41(4): 1077-1082.

[3] Han J, Kamber M. Data Mining Concepts and Techniques[M]. San Francisco: Morgan Kaufmann, 2001.

[4] MacQueen J. Some methods for classification and analysis of multivariate observation//Proc 5th Berkeley Symp on Mathematical Statistics and Probability, 1967: 281-297.

[5] Anderberg Michael R. Cluster Analysis for Applications[M]. New York: Academic Press, 1973.

[6] Hsu C C, Huang Y P. Incremental clustering of mixed data based on distance hierarchy[J]. Expert Systems with Applications, 2008, 35(3): 1177-1185.

[7] Huang Z. Clustering large data sets with mixed numeric and categorical values//Proceedings of the First Pacific Asia Knowledge Discovery and Data Mining Conference, World Scientific, Singapore, 1997:21-34.

[8] Bezdek J C, Keller J, Krisnapuram R. Fuzzy Models and Algorithms for Pattern Recognition and Image Processing[M]. Boston: Kluwer Academy Publishers, 1999.

[9] Ahmad A, Dey L. Algorithm for fuzzy clustering of mixed data with numeric and categorical attributes[J]. LNCS, 2005, 3816: 561-572.

[10] Chatzis Sotirios P. A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional[J]. Expert Systems with Applications, 2011, 38(7): 8684-8689.

[11] Zheng Z, Gong M G, Ma J J,et al. Unsupervised evolutionary clustering algorithm for mixed type data//IEEE Congress on Evolutionary Computation, 2010.

[12] Li C, Biswas G. Unsupervised learning with mixed numeric and nominal data//IEEE Transactions on Knowledge and Data Engineering, 2002, 14(4): 673-690.

[13] Hsu C C, Chen Y C. Mining of mixed data with application to catalog marketing[J]. Expert Systems with Applications, 2007, 32(1): 12-27.

[14] Ahmad A, Dey L. A k-mean clustering algorithm for mixed numeric and categorical data[J]. Data & Knowledge Engineering, 2007, 63(2): 503-527.

[15] Merz C, Murphy P, Aha D. UCI repository of machine learning databases. Irvine: Department of Information and Computer Science, University of California, 1997.

[16] Huang Z, Ng M K. A fuzzy k-modes algorithm for clustering categorical data[J]. IEEE Trans Fuzzy System, 1999, 7(4): 446-452.
[1] LIU Zhong-min,WANG Yang,LI Zhan-ming,HU Wen-jin. Image segmentation algorithm based on SLIC and fast nearest neighbor region merging [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1931-1937.
[2] DONG Sa, LIU Da-you, OUYANG Ruo-chuan, ZHU Yun-gang, LI Li-na. Logistic regression classification in networked data with heterophily based on second-order Markov assumption [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1571-1577.
[3] GU Hai-jun, TIAN Ya-qian, CUI Ying. Intelligent interactive agent for home service [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1578-1585.
[4] GUI Chun, HUANG Wang-xing. Network clustering method based on improved label propagation algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1600-1605.
[5] ZHANG Man, SHI Shu-ming. Analysis of state transition characteristics for typical vehicle driving cycles [J]. 吉林大学学报(工学版), 2018, 48(4): 1008-1015.
[6] WANG Xu, OUYANG Ji-hong, CHEN Gui-fen. Measurement of graph similarity based on vertical dimension sequence dynamic time warping method [J]. 吉林大学学报(工学版), 2018, 48(4): 1199-1205.
[7] ZHANG Hao, ZHAN Meng-ping, GUO Liu-xiang, LI Zhi, LIU Yuan-ning, ZHANG Chun-he, CHANG Hao-wu, WANG Zhi-qiang. Human exogenous plant miRNA cross-kingdom regulatory modeling based on high-throughout data [J]. 吉林大学学报(工学版), 2018, 48(4): 1206-1213.
[8] DONG Ying, CUI Meng-yao, WU Hao, WANG Yu-hou. Clustering wireless rechargeable sensor networks charging schedule based on energy prediction [J]. 吉林大学学报(工学版), 2018, 48(4): 1265-1273.
[9] HUANG Lan, JI Lin-ying, YAO Gang, ZHAI Rui-feng, BAI Tian. Construction of disease-symptom semantic net for misdiagnosis prompt [J]. 吉林大学学报(工学版), 2018, 48(3): 859-865.
[10] LI Xiong-fei, FENG Ting-ting, LUO Shi, ZHANG Xiao-li. Automatic music composition algorithm based on recurrent neural network [J]. 吉林大学学报(工学版), 2018, 48(3): 866-873.
[11] LIU Jie, ZHANG Ping, GAO Wan-fu. Feature selection method based on conditional relevance [J]. 吉林大学学报(工学版), 2018, 48(3): 874-881.
[12] DENG Jian-xun, XIONG Zhong-yang, DENG Xin. Improved DNALA algorithm based on spectral clustering matrix [J]. 吉林大学学报(工学版), 2018, 48(3): 903-908.
[13] WANG Xu, OUYANG Ji-hong, CHEN Gui-fen. Heuristic algorithm of all common subsequences of multiple sequences for measuring multiple graphs similarity [J]. 吉林大学学报(工学版), 2018, 48(2): 526-532.
[14] YANG Xin, XIA Si-jun, LIU Dong-xue, FEI Shu-min, HU Yin-ji. Target tracking based on improved accelerated gradient under tracking-learning-detection framework [J]. 吉林大学学报(工学版), 2018, 48(2): 533-538.
[15] LIU Xue-juan, YUAN Jia-bin, XU Juan, DUAN Bo-jia. Quantum k-means algorithm [J]. 吉林大学学报(工学版), 2018, 48(2): 539-544.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!