基于K近邻和多类合并的密度峰值聚类算法

吉林大学学报(理学版) ›› 2019, Vol. 57 ›› Issue (1): 111-120.

基于K近邻和多类合并的密度峰值聚类算法

薛小娜¹, 高淑萍¹, 彭弘铭², 吴会会¹

1. 西安电子科技大学数学与统计学院, 西安 710071;
2. 西安电子科技大学通信工程学院, 西安 710071

收稿日期:2017-12-16 出版日期:2019-01-26 发布日期:2019-02-08
通讯作者: 薛小娜 E-mail:xiaona_xue@163.com

Density Peaks Clustering Algorithm Based onKNearest Neighbors and ClassesMerging#br#

XUE Xiaona¹, GAO Shuping¹, PENG Hongming², WU Huihui¹

1. School of Mathematics and Statistics, Xidian University, Xi’n 710071, China;
2. School of Telecommunications Engineering, Xidian University, Xi’an 710071， China

Received:2017-12-16 Online:2019-01-26 Published:2019-02-08
Contact: XUE Xiaona E-mail:xiaona_xue@163.com

摘要/Abstract

摘要： 针对密度峰值聚类（DPC）算法在处理结构复杂、维数较高以及同类中存在多个密度峰值的数据集时聚类性能不佳的问题, 提出一种基于K近邻和多类合并的密度峰值聚类（KM-DPC）算法. 首先利用定义的密度计算方法描述样本分布, 采用新的评价指标获取聚类中心; 然后结合K近邻思想设计迭代分配策略, 将剩余点准确归类; 最后给出一种局部类合并方法, 以防将包含多个密度峰值点的类分裂. 仿真实验结果表明, 该算法在22个不同数据集上的性能明显优于DPC算法.

关键词: 聚类, 局部密度, 密度峰值, K近邻, 多类合并

Abstract: Aiming at the problem that the density peaks clustering (DPC) algorithm had poor clustering performance in dealing with data with complex structure, high dimensionality and multiple density peaks in the same class, we proposed a density peaks clustering algorithm based on Knearest neighbors and classesmerging (KMDPC). Firstly, the sample distribution was described by the defined density calculation method, and the clustering center was obtained by using new evalution index. Secondly, an iterative assignment strategy based on the idea of Knearest neighbors was designed to classify the remaining data points accurately. Finally, a local merging method was presented to prevent the splitting of classes with mu
ltiple density peaks. Simulation results show that the performance of this algorithm is obviously better than that of DPC algorithm on 22 different datasets.

Key words: clustering, local density, density peak, K-nearest , neighbor, classesmerging

中图分类号:

TP181

薛小娜, 高淑萍, 彭弘铭, 吴会会. 基于K近邻和多类合并的密度峰值聚类算法[J]. 吉林大学学报(理学版), 2019, 57(1): 111-120.

XUE Xiaona, GAO Shuping, PENG Hongming, WU Huihui. Density Peaks Clustering Algorithm Based onKNearest Neighbors and ClassesMerging#br#[J]. Journal of Jilin University Science Edition, 2019, 57(1): 111-120.

[1]	张蕾, 姜宇, 孙莉. 一种改进型TF-IDF文本聚类方法[J]. 吉林大学学报(理学版), 2021, 59(5): 1199-1204.
[2]	胡雅婷, 陈营华, 宝音巴特, 曲福恒, 李卓识. 一种增量式MinMax k-Means聚类算法[J]. 吉林大学学报(理学版), 2021, 59(5): 1205-1211.
[3]	聂逯松, 常方圆, 常学智, 刘畅, 金有为, 刘国晟, 付加胜, 韩霄松. 一种新型的自适应多核学习算法[J]. 吉林大学学报(理学版), 2021, 59(5): 1212-1218.
[4]	蒲晓川, 黄俊丽, 祁宁, 宋长松. 基于密度信息熵的K-means算法在客户细分中的应用[J]. 吉林大学学报(理学版), 2021, 59(5): 1245-1251.
[5]	曾宏志, 史洪松. 半监督技术和主动学习相结合的网络入侵检测方法[J]. 吉林大学学报(理学版), 2021, 59(4): 936-942.
[6]	李健, 姜楠, 宝音巴特, 张帆, 张伟健, 王薇. 空间颜色聚类算法及其在图像特征提取中的应用[J]. 吉林大学学报(理学版), 2020, 58(3): 627-633.
[7]	王海燕, 崔文超, 许佩迪, 李闯. Canopy在划分聚类算法中对K选取的优化[J]. 吉林大学学报(理学版), 2020, 58(3): 634-638.
[8]	吕洪武, 赵航, 王宏志, 胡黄水. 基于模糊神经网络的MVB故障诊断算法[J]. 吉林大学学报(理学版), 2020, 58(1): 104-108.
[9]	齐向明, 孙煦骄. 基于语义簇的中文文本聚类算法[J]. 吉林大学学报(理学版), 2019, 57(5): 1193-1199.
[10]	刘良凤, 刘三阳. 基于权重差异度的动态模糊聚类算法[J]. 吉林大学学报(理学版), 2019, 57(3): 574-582.
[11]	朱超平, 任继平. 基于智能优化算法的物联网异构数据融合方法[J]. 吉林大学学报(理学版), 2019, 57(3): 627-632.
[12]	刘久彪. 空间数据库反向最近邻聚类方法[J]. 吉林大学学报(理学版), 2019, 57(2): 387-392.
[13]	董立岩, 王宇, 任怡, 李永丽. 基于矩阵分解和聚类的协同过滤算法[J]. 吉林大学学报(理学版), 2019, 57(1): 105-110.
[14]	胡雅婷, 李长明, 柳振鑫, 任虹宾, 陈营华. 一种鲁棒的无监督聚类图像分割算法[J]. 吉林大学学报(理学版), 2019, 57(06): 1425-1430.
[15]	王海燕, 崔文超, 许佩迪, 李闯. 一种局部概率引导的优化K-means++算法[J]. 吉林大学学报(理学版), 2019, 57(06): 1431-1436.