吉林大学学报(工学版) ›› 2024, Vol. 54 ›› Issue (5): 1417-1425.doi: 10.13229/j.cnki.jdxbgxb.20220779

• 计算机科学与技术 • 上一篇    

二阶K近邻和多簇合并的密度峰值聚类算法

吕莉1,2(),朱梅子1,2,康平1,2,韩龙哲1,2   

  1. 1.南昌工程学院 信息工程学院,南昌 330099
    2.南昌工程学院 南昌市智慧城市物联感知与协同计算重点实验室,南昌 330099
  • 收稿日期:2022-06-20 出版日期:2024-05-01 发布日期:2024-06-11
  • 作者简介:吕莉(1982-),女,教授,博士. 研究方向:智能计算与计算智能,目标跟踪与检测,大数据与人工智能. E-mail: lvli623@163.com
  • 基金资助:
    国家自然科学基金项目(62066030)

Density peaks clustering with second-order K-nearest neighbors and multi-cluster merging

Li LYU1,2(),Mei-zi ZHU1,2,Ping KANG1,2,Long-zhe HAN1,2   

  1. 1.School of Information Engineering,Nanchang Institute of Technology,Nanchang 330099,China
    2.Nanchang Key Laboratory of IoT Perception and Collaborative Computing for Smart City,Nanchang Institute of Technology,Nanchang 330099,China
  • Received:2022-06-20 Online:2024-05-01 Published:2024-06-11

摘要:

针对流形数据中密度峰值聚类(DPC)算法的局部密度易找到错误的类簇中心,且分配策略易导致远离类簇中心的剩余样本被错误分配的问题,本文提出二阶K近邻和多簇合并的密度峰值聚类(DPC-SKMM)算法。首先,利用最小二阶K近邻定义局部密度,凸显类簇中心与非类簇中心间的密度差异,从而找到正确的类簇中心;其次,利用K近邻找出样本局部代表点并依此确定核心点,用核心点指导微簇划分;最后,利用最小二阶K近邻及共享近邻定义的微簇间吸引度合并微簇,避免远离类簇中心的样本被错误分配,且微簇合并过程无须迭代。本文将DPC-SKMM算法与IDPC-FA、DPCSA、FNDPC、FKNN-DPC、DPC算法进行对比,实验结果表明,DPC-SKMM算法能有效聚类流形及UCI数据集。

关键词: 密度峰值聚类, 流形数据, 二阶K近邻, K近邻, 吸引度, 多簇合并策略

Abstract:

In the face of manifold data, the local density of density peaks clustering (DPC) algorithm is easy to find the wrong cluster center and the allocation strategy is easy to cause the residual samples far from the cluster center to be misallocation. In view of the above problems, this paper proposes density peaks clustering with second-order K-nearest neighbors and multi-cluster merging. Firstly, the minimum second-order K-nearest neighbor is used to define the local density, highlighting the density difference between the cluster center and the non-cluster center, so as to find the correct cluster center; Secondly, the K-nearest neighbor is used to find the local representative points of the sample and determine the core points, and the core points are used to guide the micro-cluster division; Finally, the inter-cluster attraction defined by the minimum second-order K-nearest neighbor and shared nearest neighbor is used to merge the micro-clusters, which avoids the misallocation of samples away from the cluster center, and the micro-cluster merging process does not require iteration. In this paper, DPC-SKMM algorithm is compared with IDPC-FA, DPCSA, FNDPC, FKNN-DPC, DPC algorithm. Experimental results show that DPC-SKMM algorithm can cluster manifolds and UCI data sets effectively.

Key words: density peaks clustering, manifold data, second-order K-nearest neighbors, K-nearest neighbor, attractiveness, multi-cluster merging strategy

中图分类号: 

  • TP301.6

表1

6种算法在8个流形数据集上的聚类结果"

算法AMIARIFMIArg-AMIARIFMIArg-
DbCircle
DPC-SKMM1111711126
IDPC-FA0.652 60.503 30.699 9-0.462 90.438 50.765 2-
DPCSA0.413 60.109 60.468 9-0.295 00.083 30.524 2-
FNDPC0.509 80.271 40.580 30.090.423 60.273 20.586 30.29
FKNN-DPC0.510 70.271 80.579 3190.706 30.613 90.779 032
DPC0.518 50.279 40.585 34.00.359 60.301 50.604 80.3
JainCmc
DPC-SKMM111611111
IDPC-FA111-0.809 30.842 10.902 7-
DPCSA0.216 70.216 70.216 7-0.665 60.576 10.745 4-
FNDPC0.596 10.596 10.596 10.470.809 30.842 10.902 70.28
FKNN-DPC0.709 20.709 20.709 24311149
DPC0.618 30.618 30.618 30.80.385 70.266 10.537 75
CompoundLineBlobs
DPC-SKMM0.880 20.892 90.921 581115
IDPC-FA0.792 20.832 70.881 5-111-
DPCSA0.839 20.828 40.870 7-111-
FNDPC0.723 90.533 70.644 00.090.638 60.638 60.638 60.21
FKNN-DPC0.838 10.841 80.887 771117
DPC0.775 40.591 00.687 64.00.837 50.837 50.837 54.2
SticksSpiral
DPC-SKMM11151114
IDPC-FA111-111-
DPCSA0.763 40.763 40.763 4-111-
FNDPC1110.221110.07
FKNN-DPC11171115
DPC0.809 40.809 40.809 421111.8

表2

6种算法在8个流形数据集上评价指标的Friedman值"

算法秩均值
AMIARIFMI
DPC-SKMM5.195.195.19
IDPC-FA4.194.314.31
DPCSA2.382.132.13
FNDPC2.562.442.56
FKNN-DPC4.254.384.25
DPC2.442.562.56

图1

6种算法在Db数据集上的聚类结果"

表3

6种算法在8个UCI数据集上的聚类结果"

算法AMIARIFMIArg-AMIARIFMIArg-
IrisWine
DPC-SKMM0.878 60.903 80.935 5160.859 90.896 20.931 27
IDPC-FA0.862 30.885 70.923 3-0.767 50.771 30.847 8-
DPCSA0.883 10.903 80.935 5-0.748 00.741 40.828 3-
FNDPC0.883 10.903 80.935 50.110.789 80.802 50.868 60.26
FKNN-DPC0.883 10.903 80.935 5220.848 10.883 90.922 98
DPC0.860 60.885 70.923 30.20.706 50.672 40.783 52
SeedsEcoli
DPC-SKMM0.789 50.835 40.889 850.656 40.743 30.814 02
IDPC-FA0.729 90.767 00.844 4-0.663 80.756 10.828 4-
DPCSA0.660 90.687 30.791 8-0.440 60.459 30.646 7-
FNDPC0.713 60.754 50.836 10.070.483 30.561 80.717 80.35
FKNN-DPC0.775 70.802 40.868 290.587 80.589 40.702 72
DPC0.729 90.767 00.844 40.70.497 80.446 50.577 50.4
waveformLibras
DPC-SKMM0.394 20.380 30.588 6230.556 40.358 80.423 75
IDPC-FA0.353 70.291 40.530 0-0.573 30.381 60.424 7-
DPCSA0.251 00.223 60.532 7-0.538 80.309 50.379 1-
FNDPC0.352 10.282 60.530 60.210.549 40.329 00.386 90.17
FKNN-DPC0.323 90.267 10.524 420.555 40.345 90.404 410
DPC0.340 00.275 90.533 63.10.535 80.319 30.371 70.3
ParkinsonsWdbc
DPC-SKMM0.243 00.413 50.820 3200.703 20.811 70.912 226
IDPC-FA0.215 10.363 20.819 0-0.623 70.742 30.882 9-
DPCSA0.072 80.160 10.658 2-0.336 10.377 10.759 5-
FNDPC0.177 20.268 60.814 070.607 60.364 50.875 80.05
FKNN-DPC0.215 10.363 20.819 00.040.642 30.761 30.889 42
DPC0.247 80.125 60.618 71.20.436 60.496 40.794 10.5

表4

6种算法在8个UCI数据集上评价指标的Friedman值"

算法秩均值
AMIARIFMI
DPC-SKMM5.255.565.56
IDPC-FA4.134.193.81
DPCSA1.751.942.31
FNDPC3.133.063.31
FKNN-DPC4.194.254.00
DPC2.562.002.00
1 Djenouri Y, Belhadi A, Fournier-Viger P, et al. Fast and effective cluster-based information retrieval using frequent closed itemsets[J]. Information Sciences, 2018, 453: 154-167.
2 Zhao H, He C. Objective cluster analysis in value-based customer segmentation method[C]∥Proceedings of the 2nd International Workshop on Knowledge Discovery and Data Mining. Piscataway NJ: IEEE, 2009: 484-487.
3 刘仲民, 李战明, 李博皓, 等. 基于稀疏矩阵的谱聚类图像分割算法[J]. 吉林大学学报: 工学版, 2017, 47(4): 1308-1313.
Liu Zhong-min, Li Zhan-ming, Li Bo-hao, et al. Spectral clustering image segmentation based on sparse matrix[J]. Journal of Jilin University (Engineering and Technology Edition), 2017, 47(4): 1308- 1313.
4 赵嘉, 姚占峰, 吕莉, 等.基于相互邻近度的密度峰值聚类算法[J]. 控制与决策, 2021, 36(3): 543-552.
Zhao Jia, Yao Zhan-feng, Li Lyu, et al. Density peaks clustering based on mutual neighbor degree[J]. Control and Decision, 2021, 36(3): 543-552.
5 曲福恒, 潘曰涛, 杨勇,等. 基于加权空间划分的高效全局k-means聚类算法[J/OL]. 吉林大学学报: 工学版,[2024-02-28].
Qu Fu-heng, Pan Yue-tao, Yang Yong, et al. An efficient global k-means clustering algorithm based on weighted space partitioning[J/OL]. Journal of Jilin University (Engineering and Technology Edition), [2024-02-28].
6 Liew A W C, Yan H, Yang M. Pattern recognition techniques for the emerging field of bioinformatics: a review[J]. Pattern Recognition, 2005, 38(11): 2055-2073.
7 Rodriguez A, Lsio A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191): 1492-1496.
8 Wang Y, Yang Y. Relative density-based clustering algorithm for identifying diverse density clusters effectively[J]. Neural Computing and Applications, 2021, 33(16): 10141-10157.
9 Li C, Zhang Y. Density peak clustering based on relative density optimization[J]. Mathematical Problems in Engineering, 2020, 2020: No2816102.
10 赵嘉, 王刚, 吕莉, 等. 面向流形数据的测地距离与余弦互逆近邻密度峰值聚类算法[J]. 电子学报, 2022, 50(11): 2730-2737.
Zhao Jia, Wang Gang, Li Lyu, et al. Density peaks clustering algorithm based on geodesic distance and cosine mutual reverse nearest neighbor for manifold datasets[J]. Acta Electronica Sinica, 2022, 50(11): 2730-2737.
11 Yuan X, Yu H, Liang J, et al. A novel density peaks clustering algorithm based on K nearest neighbors with adaptive merging strategy[J]. International Journal of Machine Learning and Cybernetics, 2021, 12(10): 2825-2841.
12 吴润秀, 尹士豪, 赵嘉, 等. 基于相对密度估计和多簇合并的密度峰值聚类算法[J]. 控制与决策, 2023, 38(4): 1047-1055.
Wu Run-xiu, Yin Shi-hao, Zhao Jia, et al. Density peaks clustering based on relative density estimating and multi cluster merging[J]. Journal of Control and Decision, 2023, 38(4): 1047-1055.
13 赵嘉, 陈磊, 吴润秀, 等. K近邻和加权相似性的密度峰值聚类算法[J]. 控制理论与应用, 2022, 39(12): 2349-2357.
Zhao Jia, Chen Lei, Wu Run-xiu, et al. Density peaks clustering based on K-nearest neighbors and weighted similarity[J]. Control Theory & Applications, 2022, 39(12): 2349-2357.
14 Du M, Ding S, Xu X, et al. Density peaks clustering using geodesic distances[J]. International Journal of Machine Learning and Cybernetics, 2018, 9(8): 1335-1349.
15 王大刚, 丁世飞, 钟锦. 基于二阶 k 近邻的密度峰值聚类算法研究[J]. 计算机科学与探索, 2021, 15(8): 1490-1500.
Wang Da-gang, Ding Shi-fei, Zhong Jin. Research of density peaks clustering algorithm based on second-order k neighbors[J]. Journal of frontiers of computer science and technology, 2021, 15(8): 1490-1500.
16 Cheng D, Zhang S, Huang J. Dense members of local cores-based density peaks clustering algorithm[J]. Knowledge-Based Systems, 2020, 193: 105454
17 陈蔚昌, 赵嘉, 肖人彬, 等.面向密度分布不均数据的近邻优化密度峰值聚类算法[J].控制与决策, 2024, 39(3): 919-928.
Chen Wei-chang, Zhao Jia, Xiao Ren-bin, et al. Density peaks clustering algorithm with nearest neighbor optimization for data with uneven density distribution[J]. Control and Decision, 2024, 39(3): 919-928.
18 Zhao J, Tang J, Shi A, et al. Improved density peaks clustering based on firefly algorithm[J]. International Journal of Bio-Inspired Computation, 2020, 15(1): 24-42.
19 Sun L, Liu R, Xu J, et al. An adaptive density peaks clustering method with Fisher linear discriminant[J]. IEEE Access, 2019, 7: 72936-72955.
20 Du M, Ding S, Xue Y. A robust density peaks clustering algorithm using fuzzy neighborhood[J]. International Journal of Machine Learning and Cybernetics, 2018, 9(7): 1131-1140.
21 Cheng D, Zhu Q, Huang J, et al. Natural neighbor-based clustering algorithm with density peeks[C]∥2016 International Joint Conference on Neural Networks, 2016: 92-98.
22 Vinh N X, Epps J, BAILEY J. Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance[J]. The Journal of Machine Learning Research, 2010, 11: 2837-2854.
23 Fowlkes E B, Mallows C L. A method for comparing two hierarchical clusterings[J]. Journal of the American Statistical Association, 1983, 78(383): 553-569.
[1] 翁剑成,魏瑞聪,何寒梅,徐海辉,王晶晶. 基于关联路链组的城市路网短时交通流预测模型[J]. 吉林大学学报(工学版), 2023, 53(11): 3104-3112.
[2] 曹倩,李志慧,陶鹏飞,马永建,杨晨曦. 考虑风险异质特性的路网交通事故风险评估方法[J]. 吉林大学学报(工学版), 2023, 53(10): 2817-2825.
[3] 魏路,高磊,李晋宏,杨建,田玉林. 基于密度峰值聚类的交通控制子区划分方法[J]. 吉林大学学报(工学版), 2023, 53(1): 124-131.
[4] 刘富, 权美静, 王柯, 刘云, 康冰, 韩志武, 侯涛. 仿蝎子振源定位机理的位置指纹室内定位方法[J]. 吉林大学学报(工学版), 2019, 49(6): 2076-2082.
[5] 黄岚, 李玉, 王贵参, 王岩. 基于点距离和密度峰值聚类的社区发现方法[J]. 吉林大学学报(工学版), 2016, 46(6): 2042-2051.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!