Journal of Jilin University(Engineering and Technology Edition) ›› 2024, Vol. 54 ›› Issue (7): 2038-2048.doi: 10.13229/j.cnki.jdxbgxb.20221161

Previous Articles    

Peak clustering algorithm for uncertain data density based on JS divergence

Song LI1(),Xiao-nan LIU1,Juan LIU2   

  1. 1.School of Computer Science and Technology,Harbin University of Science and Technology,Harbin 150080,China
    2.Strategic Research,Qi An Xin Technology Group Inc,Beijing 100088,China
  • Received:2022-09-08 Online:2024-07-01 Published:2024-08-05

Abstract:

Aiming at the defects of traditional density-based uncertain clustering algorithm, such as parameter sensitivity and poor clustering results for complex manifold uncertain data sets, a new uncertain data density peak clustering algorithm based on JS divergence (UDPC-JS) was proposed. The algorithm first uses the uncertain natural neighborhood density factor defined by uncertain natural neighbors to remove noise points; secondly, the local density of uncertain data objects is calculated by combining uncertain natural neighbors and JS divergence. Then, the initial clustering center of uncertain data sets is found by combining the idea of representative points, and the distance based on JS divergence and graph is defined between the initial clustering centers. Then, the local density calculated based on uncertain natural neighbors and JS divergence and the newly defined distance based on JS divergence and graph between the initial clustering centers are used to construct the decision graph on the initial clustering center, and the final clustering center is selected according to the decision graph. Finally, the unassigned uncertain data objects are assigned to the cluster where their initial clustering centers are located. The experimental results show that the algorithm has better clustering effect and accuracy than the comparison algorithm and has greater advantages in dealing with uncertain data sets of complex manifolds.

Key words: uncertain data, uncertain natural neighbors, JS divergence, density peak, clustering

CLC Number: 

  • TP311

Fig.1

Effect of data uncertain factors on results"

Table 1

Laboratory datasets from UCI"

数据集InstancesDemensionalityClusters
Iris15043
Wine178133
Seeds21073
Segmentation2 310197
Habernan30632
Heart270132

Table 2

Comparison of various evaluation indicators of each clustering algorithm on UCI experimental dataset"

数据集算法F-MeasureSilhouette
IrisUDPC-JS0.90110.8346
FDBSCAN0.85220.7612
UK-medoids0.86510.7703
FOPTICS0.83750.6953
FDBSCAN-KL0.88640.8162
NBC-JS0.89330.8269
WineUDPC-JS0.78460.8818
FDBSCAN0.64510.7059
UK-medoids0.67320.7187
FOPTICS0.70450.6042
FDBSCAN-KL0.77530.7354
NBC-JS0.74650.7245
SeedsUDPC-JS0.90550.7887
FDBSCAN0.70420.7567
UK-medoids0.70110.6432
FOPTICS0.82310.7442
FDBSCAN-KL0.81560.7465
NBC-JS0.88420.7885
SegmentationUDPC-JS0.47980.3458
FDBSCAN0.20580.2060
UK-medoids0.44010.3264
FOPTICS0.22750.2780
FDBSCAN-KL0.40140.3155
NBC-JS0.32210.2998
HabernanUDPC-JS0.78440.6828
FDBSCAN0.66430.6132
UK-medoids0.65400.6279
FOPTICS0.75530.6478
FDBSCAN-KL0.79010.6210
NBC-JS0.74620.6732
HeartUDPC-JS0.79950.7887
FDBSCAN0.66540.7321
UK-medoids0.76120.7421
FOPTICS0.70950.6653
FDBSCAN-KL0.75900.6043
NBC-JS0.66320.7963

Fig.2

Clustering accuracy comparison diagram of each clustering algorithm on UCI laboratory data set"

Fig.3

Impact of data volume on the CPU runningtime of each algorithm"

Fig.4

Influence of data dimension on CPU running time of each algorithm"

Fig.5

Impact of data volume on the clustering accuracy of each algorithm"

Fig.6

Impact of date dimension on the clustering accuracy of each algorithm"

1 李松, 王冠群, 郝晓红, 等. 面向推荐系统的多目标决策优化算法[J].西安交通大学学报, 2022, 56(8):104-112.
Li Song, Wang Guan-qun, Hao Xiao-hong, et al. A multi-objective decision optimization algorithm for recommendation system[J]. Journal of Xi´an Jiao Tong University, 2022, 56(8):104-112.
2 Khan S S, Ahmad A. Cluster center initialization algorithm for K-means clustering[J]. Pattern Recognition Letters, 2004, 25(11): 1293-1302.
3 Chau M, Cheng R, Kao B, et al. Uncertain data mining: an example in clustering location data[C]∥ Pacific-Asia Conference on Knowledge Discovery and Data Mining. Berlin:Springer, 2006: 199-204.
4 Kaufman L, Rousseeuw P J. An Introduction to Cluster Analysis[M]. London:John Wiley and Sons, Incorporated, 1990.
5 Gullo F, Ponti G, Tagarelli A. Clustering uncertain data via k-medoids[C]∥Proceedings of the 2nd international conference on Scalable Uncertainty Management, Berlin,Germany,2008: 229-242.
6 Tran L, Duckstein L. Comparison of fuzzy numbers using a fuzzy distance measure[J]. Fuzzy Sets and Systems, 2002, 130(3): 331-341.
7 Ester M, Kriegel H P, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]∥The 2nd International Conference on Knowledge Discovery and Data Mining, Portland,USA,1996: 226-231.
8 Kriegel H P, Pfeifle M. Hierarchical density-based clustering of uncertain data[C]∥Fifth IEEE International Conference on Data Mining,Houston, USA,2005: 672-677.
9 Ankerst M, Breunig M M, Kriegel H P, et al. OPTICS: ordering points to identify the clustering structure[J]. ACM Sigmod record, 1999, 28(2): 49-60.
10 Liu H, Zhang X, Zhang X, et al. Self-adapted mixture distance measure for clustering uncertain data[J]. Knowledge-Based Systems, 2017, 126: 33-47.
11 Rodriguez A, Laio A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191): 1492-1496.
12 Ni L, Luo W, Bu C, et al. Improved CFDP algorithms based on shared nearest neighbors and transitive closure[C]∥Pacific-Asia Conference on Knowledge Discovery and Data Mining,Tokyo, Japan, 2017: 79-93.
13 Guo Z, Huang T, Cai Z, et al. A new local density for density peak clustering[C]∥Pacific-Asia Conference on Knowledge Discovery and Data Mining,Tokyo, Japan,2018: 426-438.
14 纪霞, 姚晟, 赵鹏. 相对邻域与剪枝策略优化的密度峰值聚类算法[J]. 自动化学报, 2019, 45(4): 1-14.
Ji Xia, Yao Sheng, Zhao Peng. Density peak clustering algorithm optimized by relative neighborhood and pruning strategy[J]. Acta Automation Sinica, 2019, 45(4): 1-14.
15 Wu Y, He Y, Huang J Z.Clustering ensembles based on probability density function estimation[C]∥The 7th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/The 6th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom), New York,USA,2020 : 126-131.
16 Yang L, Bi S, Faes M G R, et al. Bayesian inversion for imprecise probabilistic models using a novel entropy-based uncertainty quantification metric[J]. Mechanical Systems and Signal Processing, 2022, 162: No.107954.
17 Kingetsu Y, Hamasuna Y. Jensen–Shannon divergence-based k-medoids clustering[J]. Journal of Advanced Computational Intelligence and Intelligent Informatics, 2021, 25(2): 226-233.
18 Yang L, Zhu Q, Huang J, et al. Adaptive edited natural neighbor algorithm[J]. Neurocomputing, 2017, 230: 427-433.
19 Dai Q Z, Xiong Z Y, Xie J, et al. A novel clustering algorithm based on the natural reverse nearest neighbor structure[J]. Information Systems, 2019, 84: 1-16.
20 Zhou S, Zhao Y, Guan J, et al. A neighborhood-based clustering algorithm[C]∥Pacific-Asia Conference on Knowledge Discovery and Data Mining,Tokyo, Japan, 2005: 361-371.
21 Huang J, Zhu Q, Yang L, et al. QCC: a novel clustering algorithm based on Quasi-Cluster centers[J]. Machine Learning, 2017, 106(3): 337-357.
22 Tenenbaum J B, de Silva V, Langford J C. A global geometric framework for nonlinear dimensionality reduction[J]. Science, 2000, 290(5500): 2319-2323.
23 Cheng D, Zhang S, Huang J. Dense members of local cores-based density peaks clustering algorithm[J]. Knowledge-Based Systems, 2020,193:No.105454.
24 Jiang B, Pei J, Tao Y, et al. Clustering uncertain data based on probability distribution similarity[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 25(4): 751-763.
25 Cai Y, Zhang Y, Qu J, et al. Differential privacy preserving dynamic data release scheme based on Jensen-Shannon divergence[J]. China Communications, 2022, 19(6): 11-21.
[1] Xi-jun ZHANG,Guang-jie YU,Yong CUI,Ji-yang SHANG. Short-term traffic flow prediction based on clustering algorithm and graph neural network [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(6): 1593-1600.
[2] Di LIU,Yao SUN,Yun-feng HU,Hong CHEN. Commercial vehicle formation strategy based on density clustering [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(5): 1459-1468.
[3] Li LYU,Mei-zi ZHU,Ping KANG,Long-zhe HAN. Density peaks clustering with second-order K-nearest neighbors and multi-cluster merging [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(5): 1417-1425.
[4] Gui-zhen CHEN,Hui-ting CHENG,Cai-hua ZHU,Yu-ran LI,Yan LI. A risk evaluation method for urban intersections considering drivers' physiological information [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(5): 1277-1284.
[5] Xi-guang ZHANG,Long-fei ZHANG,Yu-xi MA,Yin-ting FAN. Design of fuzzy clustering algorithm for massive cloud data based on density peak [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(5): 1401-1406.
[6] Fu-heng QU,Yue-tao PAN,Yong YANG,Ya-ting HU,Jian-fei SONG,Cheng-yu WEI. An efficient global K-means clustering algorithm based on weighted space partitioning [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(5): 1393-1400.
[7] Shi-jun SONG,Min FAN. Design of big data anomaly detection model based on random forest algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2659-2665.
[8] Liang FAN,Ying-ming XU,Yang TAN. Interface slip calculation of prefabricated steel⁃concrete composite beams with clustering studs [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2533-2541.
[9] Ya-li ZHANG,Rui FU,Wei YUAN,Ying-shi GUO. Classification and recognition model of entering and leaving stops' driving style considering energy consumption [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(7): 2029-2042.
[10] Zhuang-zhuang LIU,Wen-qing ZHENG,Jian ZHENG,Yi-zheng LI,Peng-yu JI,Ai-min SHA. Pavement surface temperature monitoring method based on gridding approach [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(6): 1746-1755.
[11] Yao-long KANG,Li-lu FENG,Jing-an ZHANG,Su-e CAO. Fast outlier mining algorithm in uncertain data set based on spectral clustering [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(4): 1181-1186.
[12] Rong-han YAO,Wen-tao XU,Wei-wei GUO. Drivers' takeover behavior and intention recognition based on factor and long short⁃term memory [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(3): 758-771.
[13] Jian-cheng WENG,Rui-cong WEI,Han-mei HE,Hai-hui XU,Jing-jing WANG. Urban road network short-term traffic flow prediction model based on associated road chain group [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(11): 3104-3112.
[14] Yi-min MAO,Sen-qing GU. Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(10): 2909-2916.
[15] Si-bao WANG,Zhong-zheng GUO,Chi MA,Shi-long WANG. Thermal-force deformation analysis and prediction modeling of CNC gear hobbing machine workbench [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(10): 2761-2772.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!