吉林大学学报(理学版) ›› 2026, Vol. 64 ›› Issue (2): 359-0369.

• • 上一篇    下一篇

基于自适应分层共享近邻的密度峰值聚类算法

杜睿山1,2, 芦博瑞1, 孟令东2, 江南3, 张云柏4   

  1. 1. 东北石油大学 计算机与信息技术学院, 黑龙江 大庆 163318;2. 油气藏及地下储库完整性评价黑龙江省重点实验室, 黑龙江 大庆 163318; 3. 迈阿密大学 赫伯特商学院, 美国 佛罗里达州 科勒尔盖布尔斯 33146; 4. 哥伦比亚大学 数据科学学院, 美国 纽约 10027
  • 收稿日期:2024-10-28 出版日期:2026-03-26 发布日期:2026-03-26
  • 通讯作者: 杜睿山 E-mail:durs918@163.com

Density Peak Clustering Algorithm Based on Adaptive Hierarchical Shared Neighbors

DU Ruishan1,2, LU Borui1, MENG Lingdong2, JIANG Nan3, ZHANG Yunbai4   

  1. 1. School of Computer and Information Technology, Northeast Petroleum University, Daqing 163318, Heilongjiang Province, China;2. Key Laboratory of Oil and Gas Reservoir and Underground Gas Storage Integrity Evaluations (Northeast Petroleum University), Daqing 163318, Heilongjiang Province, China; 3. Herbert Business School, University of Miami, Coral Gables 33146, Florida, USA; 4. Data Science Institute, Columbia University, New York 10027, USA
  • Received:2024-10-28 Online:2026-03-26 Published:2026-03-26

摘要: 针对传统密度峰值聚类算法未考虑类簇间密度差异、 需预先设定类簇数量以及单一分配策略方面的不足, 提出一种基于自适应分层共享近邻的密度峰值聚类算法. 首先, 通过自适应共享近邻与分层次增加权重的方式计算样本间相似度, 重新定义局部密度和相对距离;其次, 引入二阶导数识别拐点, 并基于拐点信息计算加权三角形面积以自动选取聚类中心; 最后, 结合相似度矩阵与相对距离进行二次分配以降低链式反应的影响. 在9个人工数据集和9个UCI真实数据集上的实验结果表明, 该算法在聚类性能上普遍优于密度峰值聚类算法及其改进算法, 展现出更高的准确性和鲁棒性, 适用于复杂分布数据的聚类分析.

关键词: 密度峰值聚类, 分层共享近邻, 局部密度, 聚类中心, 分配策略

Abstract: Aiming at  the limitations of the original density peaks clustering algorithm, including its neglect of inter-cluster density variations, requirement for predefining the number of clusters, and reliance on a single allocation strategy, we proposed a density peak clustering algorithm based on adaptive hierarchical shared neighbors. Firstly, we  calculated similarity between samples and redefined local density and relative distance by adaptively sharing  neighbors and hierarchically increasing  weights. Secondly, we introduced the second-order derivatives to identify inflection points and calculated the weighted triangular areas based on inflection point information to automatically select clustering centers. Finally,  we combined the similarity matrix with relative distance for  secondary allocation to reduce the effects of  chain reactions. Experimental results on nine artificial datasets and nine UCI real datasets show that the proposed algorithm generally outperforms the density peaks clustering algorithm and other improved algorithms in clustering performance, exhibiting higher accuracy and robustness, and is well-suited for clustering analysis of complex data distributions.

Key words: density peak clustering, hierarchical shared neighbor, local density, clustering center, assignment policy

中图分类号: 

  • TP311.13