Journal of Jilin University(Engineering and Technology Edition) ›› 2021, Vol. 51 ›› Issue (5): 1808-1816.doi: 10.13229/j.cnki.jdxbgxb20200645

Previous Articles     Next Articles

SP⁃k⁃means-+ algorithm based on simulated partition

Yong YANG1,2(),Qiang CHEN1,Fu-heng QU1,Jun-jie LIU1,Lei ZHANG1   

  1. 1.College of Computer Science and Technology,Changchun University of Science and Technology,Changchun 130022,China
    2.College of Education,Changchun Normal University,Changchun 130032,China
  • Received:2020-08-21 Online:2021-09-01 Published:2021-09-16

Abstract:

K-means algorithm is easy to fall into local optimal solution, which leads to poor optimization effect of objective function, especially when k-means is applied to data sets which the number of data elements, data dimension and clusters are large. To solve this problem, iterative k-means minus–plus(I-k-means-+) is proposed in 2018, as a new clustering algorithm of objective function optimization for k-means, which improves iteratively the quality of solution of clustering by removing one cluster (minus), dividing another one (plus), from the generated clustering solution, and updating each cluster with topical k-means. However, I-k-means-+ algorithm roughly estimates the gain value and cost value of each cluster, sacrifices precision for lower computational complexity and affects the optimization of objective function. Focus on this problem, in this paper, there is a modified I-k-means-+ algorithm called SP-k-means-+ which propose to calculate the gain value and cost value of each cluster more accurately, based on simulated partition of every cluster. It reduces the possibility of missing detection and misjudgment in the process of cluster pair matching and avoids the redundant computation caused by invalid iteration in I-k-means-+ algorithm, selects more suitable cluster pairs to divide or delete in each iteration. SP-k-means-+ algorithm optimizes the objective function further, improves partial computing efficiency, and the simulated partition is accelerating by a new heuristic acceleration algorithm. Results of experiments according to real data set show that, given the same initial clustering solution, when there is no -+ process, the efficiency of SP-k-means-+ algorithm is improved by 16%; when there is -+ process, SP-k-means-+ algorithm is improved by 10%—47% in objective function optimization compared with I-k-means-+ algorithm, meanwhile two algorithm has close runtimes.

Key words: artificial intelligence, I-k-means-+, objective function optimization, divide and remove, simulated partition, suitable cluster pairs

CLC Number: 

  • TP391
1 Jain A K. Data clustering: 50 years beyond k-means[J]. Pattern Recognition Letters, 2010, 31(8): 651-666.
2 孙吉贵, 刘杰, 赵连宇. 聚类算法研究[J]. 软件学报, 2008, 19(1): 48-61.
Sun Ji-gui, Liu Jie, Zhao Lian-yu. Clustering algorithms research[J]. Journal of Software, 2008, 19(1): 48-61.
3 徐森, 卢志茂, 顾国昌. 结合k均值和非负矩阵分解集成文本聚类算法[J]. 吉林大学学报: 工学版, 2011, 41(4): 1077-1082.
Xu Sen, Lu Zhi-mao, Gu Guo-chang. Integrating k-means and non-negative matrix factorization to ensemble document clustering[J]. Journal of Jilin University (Engineering and Technology Edition), 2011, 41(4): 1077-1082.
4 李宾, 周旭, 梅芳, 等. 基于k-means和矩阵分解的位置推荐算法[J]. 吉林大学学报: 工学版, 2019, 49(5): 1653-1660.
Li Bin, Zhou Xu, Mei Fang, et al. Location recommendation algorithm based on k-means and matrix factorization[J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(5): 1653-1660.
5 Fränti P, Sieranoja S. How much k-means can be improved by using better initialization and repeats?[J]. Pattern Recognition, 2019, 93: 95-112.
6 谢娟英, 王艳娥. 最小方差优化初始聚类中心的k-means算法[J]. 计算机工程, 2014, 40(8): 205-211, 223.
Xie Juan-ying, Wang Yan-e. k-means algorithm based on minimum deviation initialized clustering centers[J]. Computer Engineering, 2014, 40(8): 205-211, 223.
7 孟子健, 马江洪. 一种可选初始聚类中心的改进k均值算法[J]. 统计与决策, 2014(12): 12-14.
Meng Zi-jian, Ma Jiang-hong. An improved k-means algorithm with optional initial clustering centers[J]. Statistics & Decision, 2014(12): 12-14.
8 Arthur D, Vassilvitskii S. K-means++: the advantages of careful seeding[J]. Proceedings of the Eighteenth Annual ACM-SIAM Symposiumon Discrete algorithms, Society for Industrial and Applied Mathematics, 2007, 11(6): 1027-1035.
9 钟熙, 孙祥娥. 基于k-means++聚类的朴素贝叶斯集成方法研究[J]. 计算机科学, 2019, 46(): 439-441, 451.
Zhong Xi, Sun Xiang-e. Research on naive bayes ensemble method based on k-means++ clustering[J]. Computer Science, 2019, 46(Sup.1): 439-441, 451.
10 黄岚, 李玉, 王贵参, 等. 基于点距离和密度峰值聚类的社区发现方法[J]. 吉林大学学报: 工学版, 2016, 46(6): 2042-2051.
Huang Lan, Li Yu, Wang Gui-shen, et al. Community detection method based on vertex distance and clustering of density peaks[J]. Journal of Jilin University(Engineering and Technology Edition), 2016, 46(6): 2042-2051.
11 唐泽坤, 朱泽宇, 杨裔, 等. 基于距离和密度的D-k-means算法[J]. 计算机应用研究, 2020, 37(6): 1719-1723.
Tang Ze-kun, Zhu Ze-yu, Yang Yi, et al. D-k-means algorithm based on distance and density[J]. Application Research of Computers, 2020, 37(6): 1719-1723.
12 Tzortzis G, Likas A. The MinMax k-means clustering algorithm[J]. Pattern Recognition, 2014, 47(7): 2505-2516.
13 Wang X Y, Bai Y P. A modified minmax k-means algorithm based on PSO[J]. Computational Intelligence & Neuroscience, 2016(7): 27656201.
[1] Ya-hui ZHAO,Fei-yang YANG,Zhen-guo ZHANG,Rong-yi CUI. Korean text structure discovery based on reinforcement learning and attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(4): 1387-1395.
[2] Yan-hua DONG,Jing-wei LIU,Jing-hua ZHAO,Liang LI,Fang-xi XIE. Real-time torque tracking control based on BPNN online learning prediction model [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(4): 1405-1413.
[3] Fu LIU,Yi-xin LIANG,Tao HOU,Yang SONG,Bing KANG,Yun LIU. Improvement of fuzzy c-harmonic mean algorithm on unbalanced data [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(4): 1447-1453.
[4] Fu-hua SHANG,Mao-jun CAO,Cai-zhi WANG. Local outlier data mining based on artificial intelligence technology [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(2): 692-696.
[5] Hai-ying ZHAO,Wei ZHOU,Xiao-gang HOU,Xiao-li ZHANG. Double-layer annotation of traditional costume images based on multi-task learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(1): 293-302.
[6] Dan-tong OUYANG,Cong MA,Jing-pei LEI,Sha-sha FENG. Knowledge graph embedding with adaptive sampling [J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(2): 685-691.
[7] Yi-bin LI,Jia-min GUO,Qin ZHANG. Methods and technologies of human gait recognition [J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(1): 1-18.
[8] Qian XU,Ying LI,Gang WANG. Pedestrian-vehicle detection based on deep learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(5): 1661-1667.
[9] Wan-fu GAO,Ping ZHANG,Liang HU. Nonlinear feature selection method based on dynamic change of selected features [J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(4): 1293-1300.
[10] Dan⁃tong OUYANG,Jun XIAO,Yu⁃xin YE. Distant supervision for relation extraction with weakconstraints of entity pairs [J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(3): 912-919.
[11] GU Hai-jun, TIAN Ya-qian, CUI Ying. Intelligent interactive agent for home service [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1578-1585.
[12] DONG Sa, LIU Da-you, OUYANG Ruo-chuan, ZHU Yun-gang, LI Li-na. Logistic regression classification in networked data with heterophily based on second-order Markov assumption [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1571-1577.
[13] WANG Xu, OUYANG Ji-hong, CHEN Gui-fen. Measurement of graph similarity based on vertical dimension sequence dynamic time warping method [J]. 吉林大学学报(工学版), 2018, 48(4): 1199-1205.
[14] ZHANG Hao, ZHAN Meng-ping, GUO Liu-xiang, LI Zhi, LIU Yuan-ning, ZHANG Chun-he, CHANG Hao-wu, WANG Zhi-qiang. Human exogenous plant miRNA cross-kingdom regulatory modeling based on high-throughout data [J]. 吉林大学学报(工学版), 2018, 48(4): 1206-1213.
[15] LI Xiong-fei, FENG Ting-ting, LUO Shi, ZHANG Xiao-li. Automatic music composition algorithm based on recurrent neural network [J]. 吉林大学学报(工学版), 2018, 48(3): 866-873.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!