基于密度峰值的不确定性数据聚类算法

吉林大学学报(信息科学版) ›› 2026, Vol. 44 ›› Issue (2): 392-398.

基于密度峰值的不确定性数据聚类算法

郎加云, 丁晓梅

安徽文达信息工程学院计算机工程学院, 合肥 231201

收稿日期:2024-05-26 出版日期:2026-04-14 发布日期:2026-04-14
作者简介:郎加云（1986- ），女，安徽含山人，安徽文达信息工程学院讲师, 主要从事计算机科学与技术、物联网技术研究, (Tel)86-19956519660（E-mail）jiangliao698@163.com。
基金资助:
安徽省高等学校省级质量工程基金资助项目(2020jyxm0757); 安徽省高校优秀拔尖人才培育基金资助项目(gxyq2021239)

Clustering Algorithm for Uncertain Data Based on Peak Density

LANG Jiayun, DING Xiaomei

School of Computer Engineering, Anhui Wenda Information Engineering College, Hefei 231201, China

Received:2024-05-26 Online:2026-04-14 Published:2026-04-14

摘要/Abstract

摘要：

针对不确定数据规模较大, 类簇划分精准度受限, 导致数据聚类效率较低的问题, 提出基于密度峰值的不确定性数据聚类算法。利用马氏距离法, 剔除相关性较小的干扰性样本数据, 通过熵值计算不确定数据样本缺失值, 逐步进行反向还原; 利用密度峰值计算法确定类簇中心的分布情况, 引入决策图, 进行数据类簇划分;利用 K 近邻思想得到非类中心数据样本信任值, 二次识别及划分类簇内信任值差距较大的数据点以及噪点,完成密度峰聚类法的优化。实验结果表明, 面对大规模数据时, 依然实现了类簇精准划分, 聚类耗时较少。所提方法具有较高的运算效率, 对不确定性数据挖掘与分析具有重大意义。

关键词:

马氏距离法, 熵值, 决策图, 密度峰值, K 近邻思想

Abstract:

Due to the large scale of uncertain data and limited accuracy in clustering, the efficiency of data clustering is low. Therefore, a density peak based uncertain data clustering algorithm is proposed. Using the Mahalanobis distance method, interfering sample data is eliminated with low correlation, the missing values of uncertain data samples is calculated through entropy, and gradually the reverse restoration is performed. Using the density peak calculation method the distribution of cluster centers is determined. A decision graph is introduced to partition data clusters, the K-nearest neighbor idea is used to calculate the trust values of non cluster center data samples, secondary identification and partitioning of data points and noise with large trust value differences within clusters, optimizing the density peak clustering method. The experimental results show that when facing large-scale data, accurate clustering can still be achieved with less clustering time. The proposed method has high computational efficiency and has great significance for uncertain data mining and analysis.

Key words:

Markov distance method, Entropy value, decision diagram, peak density, K-nearest neighbor thought

中图分类号:

TP301. 6

郎加云, 丁晓梅.

基于密度峰值的不确定性数据聚类算法

[J]. 吉林大学学报(信息科学版), 2026, 44(2): 392-398.

LANG Jiayun, DING Xiaomei.

Clustering Algorithm for Uncertain Data Based on Peak Density

[J]. Journal of Jilin University (Information Science Edition), 2026, 44(2): 392-398.