Journal of Jilin University(Engineering and Technology Edition) ›› 2024, Vol. 54 ›› Issue (5): 1393-1400.doi: 10.13229/j.cnki.jdxbgxb.20221338

Previous Articles    

An efficient global K-means clustering algorithm based on weighted space partitioning

Fu-heng QU1(),Yue-tao PAN1,Yong YANG1,2,Ya-ting HU3,Jian-fei SONG1,Cheng-yu WEI3   

  1. 1.College of Computer Science and Technology,Changchun University of Science and Technology,Changchun 130022,China
    2.Jilin Technology College of Electronic Information,Jilin 132021 China
    3.College of Information Technology,Jilin Agricultural University,Changchun 130118,China
  • Received:2022-10-18 Online:2024-05-01 Published:2024-06-11

Abstract:

Aiming at the problem of large amount of calculation caused by exhaustive sample points in global K-means clustering algorithm, this paper proposes an efficient global K-means clustering algorithm based on weighted space partition. Firstly, the sample space is divided into grids, and then the density criterion and distance criterion are proposed to filter the grids, and the grids with large density and far distance from each other are retained as candidate center grids. In order to avoid the limitation that the global K-means algorithm only selects candidate centers in the sample set, the weight criterion and the center iteration strategy are proposed to expand the candidate centers and increase the diversity of the candidate centers. Finally, the candidate centers were traversed by incremental clustering to obtain the final clustering result. The experimental results on UCI data sets show that compared with the global K-means algorithm, the computational efficiency of the new algorithm is improved by 89.39%~95.79% on average under the premise of ensuring the clustering accuracy. Compared with K-means ++, IK-+ and the recently proposed CD algorithm, the new algorithm has higher accuracy and overcomes the problem of unstable clustering results caused by random initialization.

Key words: artificial intelligence, K-means algorithm, clustering center, multidimensional grid space, weight, incremental clustering

CLC Number: 

  • TP391

Fig.1

Partition results of spatial structure of two-dimensional grids"

Fig.2

Schematic diagram of cluster division"

Fig.3

Select the initial cluster center to be selected before expansion"

Fig.4

Selection of initial cluster centers to be selected after expansion"

Table 1

Information of five UCI datasets"

数据集数据个数特征维数
Image Segmentation(IS)2 31019
Gender Gap in Spanish WP(GGSW)4 74619
Isolet7 797617
AI4I 2020 Predictive Maintenance(APM)10 00011
Condition Based Maintenance(CBM)11 93416

Table 2

Operation results of different number of clusters in the dataset"

数据集聚类个数SSE
KMK-means++CDIK-+GKM本文
ISK=52.006 0E+072.059 8E+072.028 3E+071.962 8E+071.714 3E+071.714 3E+07
K=101.137 1E+071.221 4E+071.135 0E+071.209 8E+079.967 2E+061.012 8E+07
K=206.664 2E+066.165 8E+066.456 7E+065.481 0E+065.132 8E+065.299 9E+06
K=503.525 9E+062.538 2E+063.399 1E+062.344 2E+062.278 6E+062.267 0E+06
GGSWK=51.098 6E+241.064 8E+241.097 3E+241.060 9E+241.019 5E+241.019 5E+24
K=105.237 9E+235.103 3E+235.225 7E+235.001 5E+234.691 1E+234.696 6E+23
K=202.524 1E+232.242 6E+232.513 3E+232.267 7E+232.072 1E+232.095 8E+23
K=507.839 6E+225.009 4E+227.783 6E+226.487 3E+224.556 1E+224.411 7E+22
IsoletK=56.170 0E+056.175 0E+056.163 8E+056.168 8E+056.286 5E+056.136 5E+05
K=105.360 6E+055.376 6E+055.346 9E+055.401 5E+055.435 6E+055.306 8E+05
K=204.671 5E+054.653 6E+054.681 8E+054.664 0E+054.606 2E+054.663 0E+05
K=504.055 0E+054.048 5E+054.049 2E+054.021 2E+054.070 3E+054.038 9E+05
APMK=57.270 2E+077.100 2E+077.267 4E+077.185 1E+077.118 3E+077.118 3E+07
K=103.231 8E+073.230 8E+073.231 8E+073.233 7E+073.228 6E+073.226 6E+07
K=201.611 7E+071.654 3E+071.610 8E+071.600 1E+071.580 0E+071.593 6E+07
K=506.902 6E+066.736 6E+066.846 1E+066.461 5E+066.297 1E+066.306 4E+06
CBMK=53.173 0E+111.491 5E+113.185 3E+111.542 5E+111.149 0E+111.670 0E+11
K=107.165 0E+101.191 0E+097.165 0E+101.300 3E+091.191 0E+091.191 0E+09
K=202.147 8E+108.484 7E+072.147 4E+108.657 5E+078.176 0E+078.159 0E+07
K=501.427 0E+082.389 2E+071.426 6E+082.275 9E+072.182 9E+072.624 3E+07

Table 3

Accuracy improvement percentage of each optimization algorithm"

聚类个数K-means++CDIK-+GKM本文
K=511.13%-0.25%11.63%17.15%14.35%
K=1018.65%0.13%19.10%23.96%24.15%
K=2023.20%0.68%25.68%28.77%27.67%
K=5029.99%1.06%28.41%34.07%34.01%
平均值20.74%0.41%21.20%25.99%25.05%

Fig.5

Comparison of running time between the proposed algorithm and GKM algorithm"

1 Rahman M A, Islam M Z. A hybrid clustering technique combining a novel genetic algorithm with K-means[J]. Knowledge-Based Systems, 2014, 71: 345-365.
2 Harjanti T W, Setiyani H, Trianto J, et al. Classification of mint leaf types based on the image using euclidean distance and K-means clustering with shape and texture feature extraction[J]. Tech-E, 2022, 5(2): 115-124.
3 刘仲民, 李战明, 李博皓,等. 基于稀疏矩阵的谱聚类图像分割算法[J]. 吉林大学学报: 工学版, 2017,47(4): 1308-1313.
Liu Zhong-min, Li Zhan-ming, Li Bo-hao, et al. Spectral clustering image segmentation based on sparse matrix[J]. Journal of Jilin University (Engineering and Technology Edition), 2017, 47(4): 1308-1313.
4 Lim Z Y, Ong L Y, Leow M C. A review on clustering techniques: creating better user experience for online roadshow[J]. Future Internet, 2021, 13(9): 233.
5 张萌谡, 刘春天, 李希今, 等. 基于K-means聚类算法的绩效考核模糊综合评价系统设计[J]. 吉林大学学报: 工学版, 2021, 51(5): 1851-1856.
Zhang Meng-su, Liu Chun-tian, Li Xi-jin, et al. Design of fuzzy comprehensive evaluation system for performance appraisal based on K-means clustering algorithm[J]. Journal of Jilin University (Engineering and Technology Edition), 2021, 51(5): 1851-1856.
6 Fränti P, Sieranoja S. How much can K-means be impr-oved by using better initialization and repeats?[J]. Pattern Recognition, 2019, 93: 95-112.
7 杨勇, 陈强, 曲福恒, 等. 基于模拟划分的SP-K-means-+算法[J]. 吉林大学学报: 工学版, 2021, 51(5): 1808-1816.
Yang Yong, Chen Qiang, Qu Fu-heng, et al. SP-K-means-+ algorithm based on simulated partition[J]. Journal of Jilin University (Engineering and Technology Edition), 2021, 51(5): 1808-1816.
8 Geng X, Mu Y, Mao S, et al. An improved K-means algorithm based on fuzzy metrics[J]. IEEE Access, 2020, 8: 217416-217424.
9 邵伦, 周新志, 赵成萍, 等. 基于多维网格空间的改进 K-means 聚类算法[J]. 计算机应用,2018, 38(10): 2850-2855.
Shao Lun, Zhou Xin-zhi, Zhao Cheng-ping, et al. Improved K-means clustering algorithm based on multidimensional grid space[J]. Computer Application, 2018,38(10): 2850-2855.
10 Manochandar S, Punniyamoorthy M, Jeyachitra R K. Development of new seed with modified validity measures for K-means clustering[J]. Computers & Industrial Engineering, 2020, 141: 106290.
11 Zhang S, Liu Z, Chen X, et al. Parallel implementation of improved K-means based on a cloud platform[J]. Information Technology and Control, 2019, 48(4): 673-681.
12 Arthur D, Vassilvitskii S. K-means++ the advantages of careful seeding[C]∥Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete algorithms, New Orleans, USA, 2007: 1027-1035.
13 Lattanzi S, Sohler C. A better K-means++ algorithm via local search[C]∥Proceedings of the 36th International Conference on Machine Learning,Long Beach,USA,2019: 3662-3671.
14 Choo D, Grunau C, Portmann J, et al. K-means++: few more steps yield constant approximation[C]∥Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, 2020: 1909-1917.
15 Ismkhan H. Ik-means-+: an iterative clustering algorithm based on an enhanced version of the K-means[J]. Pattern Recognition, 2018, 79: 402-413.
16 Chowdhury K, Chaudhuri D, Pal A K. An entropy-based initialization method of K-means clustering on the optimal number of clusters[J]. Neural Computing and Applications, 2021, 33: 6965-6982.
17 Nie F, Xue J, Wu D, et al. Coordinate descent meth-od for k-means[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021,44(5): 2371-2385.
18 Likas A, Vlassis N, Verbeek J J. The global k-means clustering algorithm[J]. Pattern Recognition, 2003,36(2): 451-461.
[1] Gui-zhen CHEN,Hui-ting CHENG,Cai-hua ZHU,Yu-ran LI,Yan LI. A risk evaluation method for urban intersections considering drivers' physiological information [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(5): 1277-1284.
[2] Shuai-shuai SUN,Chun-xiao FENG,Liang ZHANG. Path planning for multimodal quadruped robots based on discrete sampling [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(4): 1120-1128.
[3] Zhi-gang JIN,Ren-jun SU,Xiao-fang ZHAO. Psychological assessment method based on heterogeneous graph network [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(4): 1078-1085.
[4] Jing-peng GAO,Guo-xuan WANG,Lu GAO. LSTM⁃MADDPG multi⁃agent cooperative decision algorithm based on asynchronous collaborative update [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(3): 797-806.
[5] Bo-song FAN,Chun-fu SHAO. Urban rail transit emergency risk level identification method [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(2): 427-435.
[6] Liu LIU,Kun DING,Shan-shan LIU,Ming LIU. Event detection method as machine reading comprehension [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(2): 533-539.
[7] Hao YUE,Qi-yue ZHANG,Zi-yu YANG,Meng-jie REN,Xu ZHANG. Iterative weighted algorithms of static congestion traffic assignment considering spatial queuing [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(1): 136-145.
[8] Zhi-dan CAI,Ming FANG,Zhe LI,Jia-lu XU. Blind remote sensing image deblurring algorithm based on Gaussian curvature and reweighted graph total variation [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2649-2658.
[9] Shi-jun SONG,Min FAN. Design of big data anomaly detection model based on random forest algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2659-2665.
[10] Guang HUO,Da-wei LIN,Yuan-ning LIU,Xiao-dong ZHU,Meng YUAN,Di GAI. Lightweight iris segmentation model based on multiscale feature and attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2591-2600.
[11] Ying HE,Zhuo-ran WANG,Xu ZHOU,Yan-heng LIU. Point of interest recommendation algorithm integrating social geographical information based on weighted matrix factorization [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2632-2639.
[12] Jian LI,Qi XIONG,Ya-ting HU,Kong-yu LIU. Chinese named entity recognition method based on Transformer and hidden Markov model [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(5): 1427-1434.
[13] Nan ZHANG,Jian-hua SHI,Ji YI,Ping WANG. Real⁃time tracking method of underground moving target based on weighted centroid positioning [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(5): 1458-1464.
[14] Ben-huai LI,Yan-wen LIU,Lu WANG,Xue-qian CHEN,Wen-jie ZUO. Crashworthiness optimization for frontal⁃end structure of rail vehicle constrained with absorbed energy and crushed displacements [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(4): 982-988.
[15] Yue-kun MA,Yi-feng HAO. Fast recognition method of short text named entities considering feature sparsity [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(12): 3529-3535.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!