基于自适应分层共享近邻的密度峰值聚类算法

吉林大学学报(理学版) ›› 2026, Vol. 64 ›› Issue (2): 359-0369.

基于自适应分层共享近邻的密度峰值聚类算法

杜睿山^1,2, 芦博瑞¹, 孟令东², 江南³, 张云柏⁴

1. 东北石油大学计算机与信息技术学院, 黑龙江大庆 163318；2. 油气藏及地下储库完整性评价黑龙江省重点实验室, 黑龙江大庆 163318； 3. 迈阿密大学赫伯特商学院, 美国佛罗里达州科勒尔盖布尔斯 33146; 4. 哥伦比亚大学数据科学学院, 美国纽约 10027

收稿日期:2024-10-28 出版日期:2026-03-26 发布日期:2026-03-26
通讯作者: 杜睿山 E-mail:durs918@163.com

Density Peak Clustering Algorithm Based on Adaptive Hierarchical Shared Neighbors

DU Ruishan^1,2, LU Borui¹, MENG Lingdong², JIANG Nan³, ZHANG Yunbai⁴

1. School of Computer and Information Technology, Northeast Petroleum University, Daqing 163318, Heilongjiang Province, China；2. Key Laboratory of Oil and Gas Reservoir and Underground Gas Storage Integrity Evaluations （Northeast Petroleum University), Daqing 163318, Heilongjiang Province, China; 3. Herbert Business School, University of Miami, Coral Gables 33146, Florida, USA; 4. Data Science Institute, Columbia University, New York 10027, USA

Received:2024-10-28 Online:2026-03-26 Published:2026-03-26

摘要/Abstract

摘要： 针对传统密度峰值聚类算法未考虑类簇间密度差异、需预先设定类簇数量以及单一分配策略方面的不足, 提出一种基于自适应分层共享近邻的密度峰值聚类算法. 首先, 通过自适应共享近邻与分层次增加权重的方式计算样本间相似度, 重新定义局部密度和相对距离；其次, 引入二阶导数识别拐点, 并基于拐点信息计算加权三角形面积以自动选取聚类中心；最后, 结合相似度矩阵与相对距离进行二次分配以降低链式反应的影响. 在9个人工数据集和9个UCI真实数据集上的实验结果表明, 该算法在聚类性能上普遍优于密度峰值聚类算法及其改进算法, 展现出更高的准确性和鲁棒性, 适用于复杂分布数据的聚类分析.

关键词: 密度峰值聚类, 分层共享近邻, 局部密度, 聚类中心, 分配策略

Abstract: Aiming at the limitations of the original density peaks clustering algorithm, including its neglect of inter-cluster density variations, requirement for predefining the number of clusters, and reliance on a single allocation strategy, we proposed a density peak clustering algorithm based on adaptive hierarchical shared neighbors. Firstly, we calculated similarity between samples and redefined local density and relative distance by adaptively sharing neighbors and hierarchically increasing weights. Secondly, we introduced the second-order derivatives to identify inflection points and calculated the weighted triangular areas based on inflection point information to automatically select clustering centers. Finally, we combined the similarity matrix with relative distance for secondary allocation to reduce the effects of chain reactions. Experimental results on nine artificial datasets and nine UCI real datasets show that the proposed algorithm generally outperforms the density peaks clustering algorithm and other improved algorithms in clustering performance, exhibiting higher accuracy and robustness, and is well-suited for clustering analysis of complex data distributions.

Key words: density peak clustering, hierarchical shared neighbor, local density, clustering center, assignment policy

中图分类号:

TP311.13

杜睿山, 芦博瑞, 孟令东, 江南, 张云柏. 基于自适应分层共享近邻的密度峰值聚类算法[J]. 吉林大学学报(理学版), 2026, 64(2): 359-0369.

DU Ruishan, LU Borui, MENG Lingdong, JIANG Nan, ZHANG Yunbai. Density Peak Clustering Algorithm Based on Adaptive Hierarchical Shared Neighbors[J]. Journal of Jilin University Science Edition, 2026, 64(2): 359-0369.

[1]	唐继州, 何丽莉, 白洪涛. 一种大规模稀疏中国邮递员问题快速求解方法[J]. 吉林大学学报(理学版), 2024, 62(2): 311-0319.
[2]	蒲晓川, 黄俊丽, 祁宁, 宋长松. 基于密度信息熵的K-means算法在客户细分中的应用[J]. 吉林大学学报(理学版), 2021, 59(5): 1245-1251.
[3]	胡雅婷, 陈营华, 宝音巴特, 曲福恒, 李卓识. 一种增量式MinMax k-Means聚类算法[J]. 吉林大学学报(理学版), 2021, 59(5): 1205-1211.
[4]	温有栋, 杨君, 谭飞. 基于人工鱼群算法的多用户系统资源分配策略[J]. 吉林大学学报(理学版), 2019, 57(2): 380-386.
[5]	薛小娜, 高淑萍, 彭弘铭, 吴会会. 基于K近邻和多类合并的密度峰值聚类算法[J]. 吉林大学学报(理学版), 2019, 57(1): 111-120.
[6]	赵海士, 路来君, 杨晨. 一种基于图像熵的密度峰值聚类波段选择方法[J]. 吉林大学学报(理学版), 2017, 55(02): 376-378.
[7]	杨晓伟,, 邵壮丰梁艳春,, 吴春国,. 基于局部密度比的模糊隶属度设置算法[J]. J4, 2006, 44(06): 41-44.
[8]	王爱民, 朱建启, 苑森淼. CDMA网络中混合信道分配的容量分析[J]. J4, 2004, 42(02): 230-233.