吉林大学学报(理学版) ›› 2023, Vol. 61 ›› Issue (4): 899-908.

• • 上一篇    下一篇

基于聚类质量的两阶段集成算法

闫晨, 杨有龙, 刘原园   

  1. 西安电子科技大学 数学与统计学院, 西安 710126
  • 收稿日期:2022-05-31 出版日期:2023-07-26 发布日期:2023-07-26
  • 通讯作者: 杨有龙 E-mail:ylyang@mail.xidian.edu.cn

Two Stage Ensemble Algorithm Based on Clustering Quality

YAN Chen, YANG Youlong, LIU Yuanyuan   

  1. School of Mathematics and Statistics, Xidian University, Xi’an 710126, China
  • Received:2022-05-31 Online:2023-07-26 Published:2023-07-26

摘要: 针对现有的集成聚类算法通常默认使用K-means算法作为基聚类生成器, 虽能确保聚类成员的多样性, 却忽视了差的基聚类可能会对最终聚类结果造成极大干扰的问题, 提出一种基于聚类质量的两阶段集成算法. 鉴于K-means算法运行高效但聚类质量较粗糙, 提出首先在生成阶段采用K-means算法生成基聚类成员, 然后通过群体一致性度量筛选出兼具高质量和强多样性的聚类成员, 形成候选集; 其次, 进一步在集成阶段应用信息熵知识构建基聚类加权的共协矩阵; 最后应用一致函数得到最终聚类结果. 采用3个指标在10个真实数据集上进行对比实验, 实验结果表明, 该算法在有效提升聚类结果准确度的同时, 能保持较好的鲁棒性.

关键词: 集成聚类, 聚类质量, 群体一致性, 信息熵, 一致函数

Abstract: Aiming at the problem that existing ensemble clustering algorithms usually used K-means algorithm as the base clustering generator, although it could ensure the diversity of clustering members, it ignored that poor base clusterings might cause terrible disturbance to the final clustering result, we proposed a two stage ensemble algorithm based on clustering quality. Considering that K-means algorithm ran efficiently, but the clustering quality was relatively rough, firstly, we proposed to  use K-means algorithm to generate base clustering members in the generation stage, and then  selected clustering members with both high quality and strong diversity through  group aggrement measure to form candidate ensemble. Secondly, the information entropy knowledge was futher applied to construct the weighted-clustering co-association matrix in the ensemble stage. Finally, the final clustering result was obtained by using consensus function. Three indexes were used for comparative experiments on ten real datasets, and the experimantal results show that the algorithm can effectively improve the accuracy of clustering results while maintaining good robustness.

Key words: ensemble clustering, clustering quality, group aggrement, information entropy, consensus function

中图分类号: 

  • TP391