吉林大学学报(理学版) ›› 2023, Vol. 61 ›› Issue (5): 1131-1138.

• • 上一篇    下一篇

基于min-max准则与区域划分的I-k-means-+聚类算法

曲福恒1, 宋剑飞1, 杨勇1,2, 胡雅婷3, 潘曰涛1   

  1. 1. 长春理工大学 计算机科学技术学院, 长春 130022;2. 长春师范大学 教育学院, 长春 130032; 3. 吉林农业大学 信息技术学院, 长春 130118
  • 收稿日期:2022-12-05 出版日期:2023-09-26 发布日期:2023-09-26
  • 通讯作者: 杨勇 E-mail:yy@cust.edu.cn

I-k-means-+ Clustering Algorithm Based on min-max Criterion and Region Division

QU Fuheng1, SONG Jianfei1, YANG Yong1,2, HU Yating3, PAN Yuetao1   

  1. 1. College of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China; 2. College of Education, Changchun Normal University, Changchun 130032, China; 3. College of Information Technology, Jilin Agricultural University, Changchun 130118, China
  • Received:2022-12-05 Online:2023-09-26 Published:2023-09-26

摘要: 针对I-k-means-+算法聚类结果不稳定、求解精度较低的问题, 提出一种基于min-max准则与区域划分的I-k-means-+聚类算法. 首先, 提出min-max准则, 计算每个数据点到最近中心的距离, 优先选择距离最大的数据点作为新的聚类中心, 避免多个初始中心聚集在同一个簇中的情况; 其次, 将分裂簇中的数据点分割到不同区域, 在每个区域中选取一个数据点作为候选中心, 以增加候选中心的多样性; 最后, 对于配对失败的簇, 通过增益重新选择新的分裂簇与原删除簇再次配对, 以提高配对成功率, 进一步降低目标函数值. 实验结果表明, 与I-k-means-+算法相比, 本文算法在运行效率基本相当的前提下, 求解精度平均提高6.47%, 且聚类结果更稳定;与k-means、k-means++算法相比, 本文算法的求解精度更高.

关键词: 聚类分析, k-means算法, I-k-means-+算法, min-max准则, 区域划分

Abstract: Aiming at the problem of unstable clustering results and low solving accuracy of  I-k-means-+ algorithm, we proposed I-k-means-+ clustering algorithm based on min-max criterion and region division. Firstly, the min-max criterion was proposed to calculate the distance from each data point to the nearest center, and the data point with the largest distance was preferentially selected as the new clustering center to avoid multiple initial centers gathering in the same cluster. Secondly, the data points in the split cluster were divided into different regions, and a data point was selected as the candidate center in each region to increase the diversity of the candidate center. Finally, for the clusters that failed to pair, the new split cluster was re-selected by gain to pair with the original deleted cluster again, so as to improve the pairing success rate and further reduce the objective function value. The experimental results show that compared with the I-k-means-+ algorithm, the proposed algorithm improves the accuracy of the solution by 6.47% on average while maintaining similar operational efficiency, and the clustering results are more stable. Compared with k-means and k-means++ algorithms, the proposed algorithm has higher solving accuracy.

Key words: cluster analysis, k-means algorithm, I-k-means-+ algorithm, min-max criterion, region division

中图分类号: 

  • TP391