Journal of Jilin University(Engineering and Technology Edition) ›› 2024, Vol. 54 ›› Issue (5): 1417-1425.doi: 10.13229/j.cnki.jdxbgxb.20220779

Previous Articles    

Density peaks clustering with second-order K-nearest neighbors and multi-cluster merging

Li LYU1,2(),Mei-zi ZHU1,2,Ping KANG1,2,Long-zhe HAN1,2   

  1. 1.School of Information Engineering,Nanchang Institute of Technology,Nanchang 330099,China
    2.Nanchang Key Laboratory of IoT Perception and Collaborative Computing for Smart City,Nanchang Institute of Technology,Nanchang 330099,China
  • Received:2022-06-20 Online:2024-05-01 Published:2024-06-11

Abstract:

In the face of manifold data, the local density of density peaks clustering (DPC) algorithm is easy to find the wrong cluster center and the allocation strategy is easy to cause the residual samples far from the cluster center to be misallocation. In view of the above problems, this paper proposes density peaks clustering with second-order K-nearest neighbors and multi-cluster merging. Firstly, the minimum second-order K-nearest neighbor is used to define the local density, highlighting the density difference between the cluster center and the non-cluster center, so as to find the correct cluster center; Secondly, the K-nearest neighbor is used to find the local representative points of the sample and determine the core points, and the core points are used to guide the micro-cluster division; Finally, the inter-cluster attraction defined by the minimum second-order K-nearest neighbor and shared nearest neighbor is used to merge the micro-clusters, which avoids the misallocation of samples away from the cluster center, and the micro-cluster merging process does not require iteration. In this paper, DPC-SKMM algorithm is compared with IDPC-FA, DPCSA, FNDPC, FKNN-DPC, DPC algorithm. Experimental results show that DPC-SKMM algorithm can cluster manifolds and UCI data sets effectively.

Key words: density peaks clustering, manifold data, second-order K-nearest neighbors, K-nearest neighbor, attractiveness, multi-cluster merging strategy

CLC Number: 

  • TP301.6

Table 1

Clustering results of six algorithms on eight manifold datasets"

算法AMIARIFMIArg-AMIARIFMIArg-
DbCircle
DPC-SKMM1111711126
IDPC-FA0.652 60.503 30.699 9-0.462 90.438 50.765 2-
DPCSA0.413 60.109 60.468 9-0.295 00.083 30.524 2-
FNDPC0.509 80.271 40.580 30.090.423 60.273 20.586 30.29
FKNN-DPC0.510 70.271 80.579 3190.706 30.613 90.779 032
DPC0.518 50.279 40.585 34.00.359 60.301 50.604 80.3
JainCmc
DPC-SKMM111611111
IDPC-FA111-0.809 30.842 10.902 7-
DPCSA0.216 70.216 70.216 7-0.665 60.576 10.745 4-
FNDPC0.596 10.596 10.596 10.470.809 30.842 10.902 70.28
FKNN-DPC0.709 20.709 20.709 24311149
DPC0.618 30.618 30.618 30.80.385 70.266 10.537 75
CompoundLineBlobs
DPC-SKMM0.880 20.892 90.921 581115
IDPC-FA0.792 20.832 70.881 5-111-
DPCSA0.839 20.828 40.870 7-111-
FNDPC0.723 90.533 70.644 00.090.638 60.638 60.638 60.21
FKNN-DPC0.838 10.841 80.887 771117
DPC0.775 40.591 00.687 64.00.837 50.837 50.837 54.2
SticksSpiral
DPC-SKMM11151114
IDPC-FA111-111-
DPCSA0.763 40.763 40.763 4-111-
FNDPC1110.221110.07
FKNN-DPC11171115
DPC0.809 40.809 40.809 421111.8

Table 2

Friedman values of six algorithms on eight manifold datasets"

算法秩均值
AMIARIFMI
DPC-SKMM5.195.195.19
IDPC-FA4.194.314.31
DPCSA2.382.132.13
FNDPC2.562.442.56
FKNN-DPC4.254.384.25
DPC2.442.562.56

Fig.1

Clustering results of six algorithms on Db dataset"

Table 3

Clustering effects of six algorithms on eight UCI datasets"

算法AMIARIFMIArg-AMIARIFMIArg-
IrisWine
DPC-SKMM0.878 60.903 80.935 5160.859 90.896 20.931 27
IDPC-FA0.862 30.885 70.923 3-0.767 50.771 30.847 8-
DPCSA0.883 10.903 80.935 5-0.748 00.741 40.828 3-
FNDPC0.883 10.903 80.935 50.110.789 80.802 50.868 60.26
FKNN-DPC0.883 10.903 80.935 5220.848 10.883 90.922 98
DPC0.860 60.885 70.923 30.20.706 50.672 40.783 52
SeedsEcoli
DPC-SKMM0.789 50.835 40.889 850.656 40.743 30.814 02
IDPC-FA0.729 90.767 00.844 4-0.663 80.756 10.828 4-
DPCSA0.660 90.687 30.791 8-0.440 60.459 30.646 7-
FNDPC0.713 60.754 50.836 10.070.483 30.561 80.717 80.35
FKNN-DPC0.775 70.802 40.868 290.587 80.589 40.702 72
DPC0.729 90.767 00.844 40.70.497 80.446 50.577 50.4
waveformLibras
DPC-SKMM0.394 20.380 30.588 6230.556 40.358 80.423 75
IDPC-FA0.353 70.291 40.530 0-0.573 30.381 60.424 7-
DPCSA0.251 00.223 60.532 7-0.538 80.309 50.379 1-
FNDPC0.352 10.282 60.530 60.210.549 40.329 00.386 90.17
FKNN-DPC0.323 90.267 10.524 420.555 40.345 90.404 410
DPC0.340 00.275 90.533 63.10.535 80.319 30.371 70.3
ParkinsonsWdbc
DPC-SKMM0.243 00.413 50.820 3200.703 20.811 70.912 226
IDPC-FA0.215 10.363 20.819 0-0.623 70.742 30.882 9-
DPCSA0.072 80.160 10.658 2-0.336 10.377 10.759 5-
FNDPC0.177 20.268 60.814 070.607 60.364 50.875 80.05
FKNN-DPC0.215 10.363 20.819 00.040.642 30.761 30.889 42
DPC0.247 80.125 60.618 71.20.436 60.496 40.794 10.5

Table 4

Friedman values of six algorithms on eight UCI datasets"

算法秩均值
AMIARIFMI
DPC-SKMM5.255.565.56
IDPC-FA4.134.193.81
DPCSA1.751.942.31
FNDPC3.133.063.31
FKNN-DPC4.194.254.00
DPC2.562.002.00
1 Djenouri Y, Belhadi A, Fournier-Viger P, et al. Fast and effective cluster-based information retrieval using frequent closed itemsets[J]. Information Sciences, 2018, 453: 154-167.
2 Zhao H, He C. Objective cluster analysis in value-based customer segmentation method[C]∥Proceedings of the 2nd International Workshop on Knowledge Discovery and Data Mining. Piscataway NJ: IEEE, 2009: 484-487.
3 刘仲民, 李战明, 李博皓, 等. 基于稀疏矩阵的谱聚类图像分割算法[J]. 吉林大学学报: 工学版, 2017, 47(4): 1308-1313.
Liu Zhong-min, Li Zhan-ming, Li Bo-hao, et al. Spectral clustering image segmentation based on sparse matrix[J]. Journal of Jilin University (Engineering and Technology Edition), 2017, 47(4): 1308- 1313.
4 赵嘉, 姚占峰, 吕莉, 等.基于相互邻近度的密度峰值聚类算法[J]. 控制与决策, 2021, 36(3): 543-552.
Zhao Jia, Yao Zhan-feng, Li Lyu, et al. Density peaks clustering based on mutual neighbor degree[J]. Control and Decision, 2021, 36(3): 543-552.
5 曲福恒, 潘曰涛, 杨勇,等. 基于加权空间划分的高效全局k-means聚类算法[J/OL]. 吉林大学学报: 工学版,[2024-02-28].
Qu Fu-heng, Pan Yue-tao, Yang Yong, et al. An efficient global k-means clustering algorithm based on weighted space partitioning[J/OL]. Journal of Jilin University (Engineering and Technology Edition), [2024-02-28].
6 Liew A W C, Yan H, Yang M. Pattern recognition techniques for the emerging field of bioinformatics: a review[J]. Pattern Recognition, 2005, 38(11): 2055-2073.
7 Rodriguez A, Lsio A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191): 1492-1496.
8 Wang Y, Yang Y. Relative density-based clustering algorithm for identifying diverse density clusters effectively[J]. Neural Computing and Applications, 2021, 33(16): 10141-10157.
9 Li C, Zhang Y. Density peak clustering based on relative density optimization[J]. Mathematical Problems in Engineering, 2020, 2020: No2816102.
10 赵嘉, 王刚, 吕莉, 等. 面向流形数据的测地距离与余弦互逆近邻密度峰值聚类算法[J]. 电子学报, 2022, 50(11): 2730-2737.
Zhao Jia, Wang Gang, Li Lyu, et al. Density peaks clustering algorithm based on geodesic distance and cosine mutual reverse nearest neighbor for manifold datasets[J]. Acta Electronica Sinica, 2022, 50(11): 2730-2737.
11 Yuan X, Yu H, Liang J, et al. A novel density peaks clustering algorithm based on K nearest neighbors with adaptive merging strategy[J]. International Journal of Machine Learning and Cybernetics, 2021, 12(10): 2825-2841.
12 吴润秀, 尹士豪, 赵嘉, 等. 基于相对密度估计和多簇合并的密度峰值聚类算法[J]. 控制与决策, 2023, 38(4): 1047-1055.
Wu Run-xiu, Yin Shi-hao, Zhao Jia, et al. Density peaks clustering based on relative density estimating and multi cluster merging[J]. Journal of Control and Decision, 2023, 38(4): 1047-1055.
13 赵嘉, 陈磊, 吴润秀, 等. K近邻和加权相似性的密度峰值聚类算法[J]. 控制理论与应用, 2022, 39(12): 2349-2357.
Zhao Jia, Chen Lei, Wu Run-xiu, et al. Density peaks clustering based on K-nearest neighbors and weighted similarity[J]. Control Theory & Applications, 2022, 39(12): 2349-2357.
14 Du M, Ding S, Xu X, et al. Density peaks clustering using geodesic distances[J]. International Journal of Machine Learning and Cybernetics, 2018, 9(8): 1335-1349.
15 王大刚, 丁世飞, 钟锦. 基于二阶 k 近邻的密度峰值聚类算法研究[J]. 计算机科学与探索, 2021, 15(8): 1490-1500.
Wang Da-gang, Ding Shi-fei, Zhong Jin. Research of density peaks clustering algorithm based on second-order k neighbors[J]. Journal of frontiers of computer science and technology, 2021, 15(8): 1490-1500.
16 Cheng D, Zhang S, Huang J. Dense members of local cores-based density peaks clustering algorithm[J]. Knowledge-Based Systems, 2020, 193: 105454
17 陈蔚昌, 赵嘉, 肖人彬, 等.面向密度分布不均数据的近邻优化密度峰值聚类算法[J].控制与决策, 2024, 39(3): 919-928.
Chen Wei-chang, Zhao Jia, Xiao Ren-bin, et al. Density peaks clustering algorithm with nearest neighbor optimization for data with uneven density distribution[J]. Control and Decision, 2024, 39(3): 919-928.
18 Zhao J, Tang J, Shi A, et al. Improved density peaks clustering based on firefly algorithm[J]. International Journal of Bio-Inspired Computation, 2020, 15(1): 24-42.
19 Sun L, Liu R, Xu J, et al. An adaptive density peaks clustering method with Fisher linear discriminant[J]. IEEE Access, 2019, 7: 72936-72955.
20 Du M, Ding S, Xue Y. A robust density peaks clustering algorithm using fuzzy neighborhood[J]. International Journal of Machine Learning and Cybernetics, 2018, 9(7): 1131-1140.
21 Cheng D, Zhu Q, Huang J, et al. Natural neighbor-based clustering algorithm with density peeks[C]∥2016 International Joint Conference on Neural Networks, 2016: 92-98.
22 Vinh N X, Epps J, BAILEY J. Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance[J]. The Journal of Machine Learning Research, 2010, 11: 2837-2854.
23 Fowlkes E B, Mallows C L. A method for comparing two hierarchical clusterings[J]. Journal of the American Statistical Association, 1983, 78(383): 553-569.
[1] Qian CAO,Zhi-hui LI,Peng-fei TAO,Yong-jian MA,Chen-xi YANG. Traffic accident risk assessment method for road network considering risk heterogeneity [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(10): 2817-2825.
[2] Kai LU,Wei WU,Guan-rong LIN,Xin TIAN,Jian-min XU. Combination forecasting model for number of assembling passengers at transportation terminal based on KNN regression algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(4): 1241-1250.
[3] Fu LIU, Mei-jing QUAN, Ke WANG, Yun LIU, Bing KANG, Zhi-wu HAN, Tao HOU. Indoor positioning method based on location fingerprinting of imitating mechanism of scorpion vibration source [J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(6): 2076-2082.
[4] HUANG Lan, LI Yu, WANG Gui-shen, WANG Yan. Community detection method based on vertex distance and clustering of density peaks [J]. 吉林大学学报(工学版), 2016, 46(6): 2042-2051.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!