吉林大学学报(理学版) ›› 2019, Vol. 57 ›› Issue (06): 1431-1436.

• 计算机科学 • 上一篇    下一篇

一种局部概率引导的优化K-means++算法

王海燕1,2, 崔文超3, 许佩迪3, 李闯3   

  1. 1. 长春大学 计算机科学技术学院, 长春 130022; 2. 吉林大学 理论化学研究所, 长春 130021;3. 吉林师范大学 计算机学院, 吉林 四平 136000
  • 收稿日期:2019-04-28 出版日期:2019-11-26 发布日期:2019-11-21
  • 通讯作者: 王海燕 E-mail:jlsdwhy_0820@sina.cn

An Optimized K-means++ Algorithm Guided by Local Probability

WANG Haiyan1,2, CUI Wenchao3, XU Peidi3, LI Chuang3   

  1. 1. College of Computer Science and Technology, Changchun University, Changchun 130022, China;2. Institute of Theoretical Chemistry, Jilin University, Changchun 130021, China; 3. College of Computer, Jilin Normal University, Siping 136000, Jilin Province, China
  • Received:2019-04-28 Online:2019-11-26 Published:2019-11-21
  • Contact: WANG Haiyan E-mail:jlsdwhy_0820@sina.cn

摘要: 针对K-means++算法选取初始聚类中心计算误差平方和时, 实验次数对误差平方影响不准确的问题, 提出一种PK-means++算法. 结果表明, 该算法在进行分散数据聚类时, 在同一K值情形下, 聚类后的误差平方和较原K-means++算法更稳定, 从而更好地保证了随机实验取值的稳定性.

关键词: 聚类分析, K-means++算法, 概率, 误差平方和

Abstract: Aiming at the problem that the number of experiment had an inaccurate effect on the square of errors when the K-
means++ algorithm was used to select the initial clustering center to calculate the sum squared error, we proposed a PK-means++ algorithm. The results show that the sum squared error after clustering is more stable than the original K-means++ algorithm under the same K-value when the algorithm clusters the scattered data, so the stability of random experiment value is better guaranteed.

Key words: clustering analysis, K-means++ algorithm, probability, sum squared error

中图分类号: 

  • TP39