吉林大学学报(理学版) ›› 2018, Vol. 56 ›› Issue (5): 1187-1192.

• 计算机科学 • 上一篇    下一篇

基于最小生成树的多层次k-Means聚类算法及其在数据挖掘中的应用

金晓民1,2, 张丽萍3   

  1. 1. 内蒙古大学 交通学院, 呼和浩特 010021;2. 内蒙古自治区桥梁检测与维修加固工程技术研究中心, 呼和浩特 010070;3. 内蒙古师范大学 计算机科学技术学院, 呼和浩特 010022
  • 收稿日期:2018-01-24 出版日期:2018-09-26 发布日期:2018-11-22
  • 通讯作者: 金晓民 E-mail:jxm_nd@sina.com

Multilevel k-Means Clustering Algorithm Based onMinimum Spanning Tree and Its Application in Data Mining#br#

JIN Xiaomin1,2, ZHANG Liping3   

  1. 1. Institute of Transportation, Inner Mongolia University, Hohhot 010021, China;2. Inner Mongolia Engineering Research Center of Testing and Strengthening for Bridges, Hohhot 010070, China;3. College of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022, China
  • Received:2018-01-24 Online:2018-09-26 Published:2018-11-22

摘要: 针对传统聚类算法存在挖掘效率慢、 准确率低等问题, 提出一种基于最小生成树的多层次k-means聚类算法, 并应用于数据挖掘中. 先分析聚类样本的数据类型, 根据分析结果设计聚类准则函数; 再通过最小生成树对样本数据进行划分, 并选取初始聚类中心, 将样本的数据空间划分为矩形单元, 在矩形单元中对样本对象数据进行计算、 降序和选取, 得到有效的初始聚类中心, 减少数据挖掘时间. 实验结果表明, 与传统算法相比, 该算法可快速、 准确地挖掘数据, 且挖掘效率提升约50%.

关键词: 最小生成树, 多层次k-means聚类算法, 数据挖掘

Abstract: Aiming at the problem of slow mining efficiency and low accuracy in traditional clustering algorithm, we proposed a multilevel k-means clustering algorithm based on minimum spanning tree, and applied to data mining. Firstly, we analyzed the 
data types of the clustering samples and designed the clustering criterion function according to the analysis results. Secondly, we divided the sample data by the minimum spanning tree, and selected the initial clustering center. The data space of the sample was divided into rectangular unit,  the sample object data was calculated, descended and selected in the rectangular unit, the effective initial clustering center was obtained to reduce the time spent in data mining. The experimental results show that, compared with the traditional algorithm, the proposed method can quickly and accurately excavate the data, and the efficiency of mining is increased by about 50%.

Key words: minimum spanning tree, multilevel k-means clustering algorithm, data mining

中图分类号: 

  • TP301