J4 ›› 2012, Vol. 50 ›› Issue (06): 1179-1184.

• 计算机科学 • 上一篇    下一篇

一种高效鲁棒的无监督模糊c均值聚类算法

曲福恒1, |胡雅婷2, |马驷良3, |郭世龙4, |李恒燕5   

  1. 1. 长春理工大学 计算机科学技术学院, 长春130022; 2. 吉林农业大学 信息技术学院, 长春 |130118;3. 吉林大学 数学研究所, |长春 130012; |4. 北京农商银行 信息技术部, |北京 100033;5. 华北水利水电学院 数学与信息科学学院, 郑州 450011
  • 收稿日期:2012-03-22 出版日期:2012-11-26 发布日期:2012-11-26
  • 通讯作者: 曲福恒 E-mail:qufuheng@163.com

An Efficient and Robust Clustering Algorithm for Unsupervised Fuzzy c-Means

QU Fu heng1, HU Ya ting2, MA Si\|liang3, GUO Shi long4, LI Heng yan5   

  1. 1. College of Computer Science and Technology, Changchun University of Science and Technology,Changchun 130022, China|2. College of Information and Technology, Jilin Agricultural University, Changchun 130118, China|3. Institute of Mathematics, Jilin University, Changchun 130012, China|4. Department of Information Technology, Beijing Rural Commercial Bank, Beijing 100033, China|5. School of Mathematics and Information, North China University of Water Resources and Electric Power, Zhengzhou 450011, China
  • Received:2012-03-22 Online:2012-11-26 Published:2012-11-26
  • Contact: QU Fu heng E-mail:qufuheng@163.com

摘要:

先通过数据约简技术在不损失数据聚类结构的前提下对数据进行精简, 利用提出的近似模糊c均值聚类算法对精简后数据进行划分得到初始化中心, 再在该中心基础上通过模糊c均值聚类算法结合聚类有效性指标, 实现对数据的无监督聚类, 改进了无监督模糊c均值聚类算法聚类性能过分依赖初始化中心及大数据集下计算效率不理想的问题. 与已有算法的对比实验表明, 所提出的算法具有更高的求解精度与计算效率, 得到的聚类个数更合理.

关键词: 模糊c均值, 聚类有效性, 无监督聚类, 数据约简

Abstract:

On the condition of losing less information and retaining less data,  the data were refined by the data reduction technique. The proposed approximation algorithm for fuzzy c-means clustering was used to estimate the cluster centers. Combined with validity indexed and estimated centers, FCM can execute unsupervised clustering. The proposed algorithm improved the computational efficiency and performance of the conventional unsupervised fuzzy c-means clustering algorithm. The contrast experimental results with conventional algorithms show that the proposed algorithm has a relatively high precision and efficiency. It can obtain the cluster number more accurately than the conventional algorithm.

Key words: fuzzy c-means, cluster validity, unsupervised clustering, data reduction

中图分类号: 

  • TP391.4