吉林大学学报(工学版) ›› 2018, Vol. 48 ›› Issue (1): 306-311.doi: 10.13229/j.cnki.jdxbgxb20161193

• 论文 • 上一篇    下一篇

优化初值的C均值算法

刘云, 康冰, 侯涛, 王柯, 刘富   

  1. 吉林大学 通信工程学院,长春 130022
  • 收稿日期:2016-10-28 出版日期:2018-02-26 发布日期:2018-02-26
  • 通讯作者: 刘富(1968-),男,教授,博士生导师.研究方向:计算机视觉及模式识别.E-mail: liufu@jlu.edu.cn
  • 作者简介:刘云(1989-),男,博士研究生.研究方向:模式识别及生物信息学.E-mail: liuyun11@mails.jlu.edu.cn
  • 基金资助:
    国家自然科学基金项目(61503151); 吉林省自然科学基金项目(10100505); 吉林省重点科技攻关项目(20140204046GX)

Optimal initialization-based C-means method

LIU Yun, KANG Bing, HOU Tao, WANG Ke, LIU Fu   

  1. College of Communication Engineering, Jilin University, Changchun 130022, China
  • Received:2016-10-28 Online:2018-02-26 Published:2018-02-26

摘要: 针对C均值算法(C-means method,CM)对初值敏感、易陷入局部最优的问题,提出一种优化初值的C均值算法(Optimal initialization-based CM,OICM)。该算法首先计算数据集中每个点的邻域以及邻域密度,选择具有最大邻域密度的点作为第一个聚类中心;然后,从剩余的数据集中选择具有最大邻域密度、且其邻域与已有聚类中心的邻域的连接度满足一定条件的点作为下一个聚类中心,以此类推,直到确定了C个聚类中心;最后,利用C均值算法完成数据集的聚类分析。在仿真数据集和UCI数据集上进行聚类实验,结果表明OICM算法有效地克服了传统C均值算法对初值敏感的缺点,且性能优于其他3种典型的全局C均值算法。

关键词: 计算机应用, C均值算法, 初值敏感, 邻域密度

Abstract: C-means Clustering Method (CM) is a widely for data clustering, which is sensitive to the initial cluster centers and easily leads to local optimum. To solve this problem, an Optimal Initialization-based C-means Method (OI-CM) is proposed. First for each point in the dataset, the neighborhood and neighborhood density are calculated, and the point with the maximum neighborhood density is selected as the first cluster center. Then, the point with the maximum neighborhood density from the rest datasets is selected as the next cluster center, whose neighborhood must have little coupling degree with the neighborhoods of existing cluster centers. This procedure is continued until all the cluster centers are selected. Finally, the CM is utilized to cluster the datasets with the selected cluster centers. Experimental results on simulated and UCI datasets show that the proposed OI-CM can effectively solve the sensitivity defect of the traditional CM to initial cluster centers, and has superior performance than other three global CMs.

Key words: computer application, C-means method, initial value sensitivity, neighborhood density

中图分类号: 

  • TP391
[1] 邢涛, 黄友红, 胡庆荣, 等. 基于动态K均值聚类算法的SAR图像分割[J]. 中国科学院大学学报, 2016, 33(5): 674-678.
Xing Tao, Huang You-hong, Hu Qing-rong, et al.SAR image segmentation based on dynamical K-means clustering algorithm[J]. Journal of University of Chinese Academy of Sciences, 2016, 33(5): 674-678.
[2] 李昌兴, 黄艳虎, 支晓斌, 等. 基于加速k均值的谱聚类图像分割算法改进[J]. 传感器与微系统, 2016, 35(9): 137-140.
Li Chang-xing, Huang Yan-hu, Zhi Xiao-bin, et al.Improvements of acceleration k-means based spectral clustering algorithm for image segmentation[J]. Transducer and Microsystem Technologies, 2016, 35(9): 137-140.
[3] 刘华春, 候向宁, 杨忠. 基于改进K均值算法的入侵检测系统设计[J]. 计算机技术与发展, 2016, 26(1): 101-105.
Liu Hua-chun, Hou Xiang-ning, Yang Zhong.Design of intrusion detection system based on improved K-means algorithm[J]. Computer Technology and Development, 2016, 26(1): 101-105.
[4] 徐森, 卢志茂, 顾国昌. 结合K均值和非负矩阵分解集成文本聚类算法[J]. 吉林大学学报:工学版, 2011, 41(4): 1077-1082.
Xu Sen, Lu Zhi-mao, Gu Guo-chang.Integrating K-means and non-negative matrix factorization to ensemble document clustering[J]. Journal of Jilin University (Engineering and Technology Edition), 2011, 41(4): 1077-1082.
[5] 詹森, 秦大同, 曾育平. 基于遗传优化K均值聚类算法工况识别的混合动力汽车能量管理策略[J]. 中国公路学报, 2016, 29(4): 130-137,152.
Zhan Sen, Qin Da-tong, Zeng Yu-ping.Energy management strategy of HEV based on driving cycle recognition using genetic optimized K-means clustering algorithm[J]. China Journal of Highway and Transport, 2016, 29(4): 130-137,152.
[6] Krishna K, Murty M N.Genetic K-means algorithm[J]. IEEE Transactions on Systems Man & Cybernetics Part B: Cybernetics, 1999, 29(3): 433-439.
[7] Laszlo M, Mukherjee S.A genetic algorithm using hyper-quadtrees for low-dimensional K-means clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(4): 533-543.
[8] Likas A, Vlassis N, Verbeek J J.The global K-means clustering algorithm[J]. Pattern Recognition, 2002, 36(2): 451-461.
[9] Bagirov A M.Modified global K-means algorithm for minimum sum-of-squares clustering problems[J]. Pattern Recognition, 2008, 41(10): 3192-3199.
[10] Bai L, Liang J, Sui C, et al.Fast global K-means clustering based on local geometrical information[J]. Information Sciences, 2013, 245(10): 168-180.
[11] Cao F, Liang J, Jiang G.An initialization method for the K-means algorithm using neighborhood model[J]. Computers & Mathematics with Applications, 2009, 58(3): 474-483.
[12] Fatemi S B, Mobasheri M R, Abkar A A.Improving the accuracy of multispectral image clustering by means of a new initializing method[J]. Journal of the Indian Society of Remote Sensing, 2016, 44(4): 643-650.
[13] 谢娟英, 郭文娟, 谢维信, 等. 基于样本空间分布密度的初始聚类中心优化K-均值算法[J]. 计算机应用研究, 2012, 29(3): 888-892.
Xie Juan-ying, Guo Wen-juan, Xie Wei-xin, et al.K-means clustering algorithm based on optimal initial centers related to pattern distribution of samples in space[J]. Application Research of Computers, 2012,29(3): 888-892.
[14] 赖玉霞, 刘建平. K-means算法的初始聚类中心的优化[J]. 计算机工程与应用, 2008, 44(10): 147-149.
Lai Yu-xia, Liu Jian-ping.Optimization study on initial center of K-means algorithm[J]. Computer Engineering and Applications, 2008,44(10): 147-149.
[1] 刘富,宗宇轩,康冰,张益萌,林彩霞,赵宏伟. 基于优化纹理特征的手背静脉识别系统[J]. 吉林大学学报(工学版), 2018, 48(6): 1844-1850.
[2] 王利民,刘洋,孙铭会,李美慧. 基于Markov blanket的无约束型K阶贝叶斯集成分类模型[J]. 吉林大学学报(工学版), 2018, 48(6): 1851-1858.
[3] 金顺福,王宝帅,郝闪闪,贾晓光,霍占强. 基于备用虚拟机同步休眠的云数据中心节能策略及性能[J]. 吉林大学学报(工学版), 2018, 48(6): 1859-1866.
[4] 赵东,孙明玉,朱金龙,于繁华,刘光洁,陈慧灵. 结合粒子群和单纯形的改进飞蛾优化算法[J]. 吉林大学学报(工学版), 2018, 48(6): 1867-1872.
[5] 刘恩泽,吴文福. 基于机器视觉的农作物表面多特征决策融合病变判断算法[J]. 吉林大学学报(工学版), 2018, 48(6): 1873-1878.
[6] 欧阳丹彤, 范琪. 子句级别语境感知的开放信息抽取方法[J]. 吉林大学学报(工学版), 2018, 48(5): 1563-1570.
[7] 刘富, 兰旭腾, 侯涛, 康冰, 刘云, 林彩霞. 基于优化k-mer频率的宏基因组聚类方法[J]. 吉林大学学报(工学版), 2018, 48(5): 1593-1599.
[8] 桂春, 黄旺星. 基于改进的标签传播算法的网络聚类方法[J]. 吉林大学学报(工学版), 2018, 48(5): 1600-1605.
[9] 刘元宁, 刘帅, 朱晓冬, 陈一浩, 郑少阁, 沈椿壮. 基于高斯拉普拉斯算子与自适应优化伽柏滤波的虹膜识别[J]. 吉林大学学报(工学版), 2018, 48(5): 1606-1613.
[10] 车翔玖, 王利, 郭晓新. 基于多尺度特征融合的边界检测算法[J]. 吉林大学学报(工学版), 2018, 48(5): 1621-1628.
[11] 赵宏伟, 刘宇琦, 董立岩, 王玉, 刘陪. 智能交通混合动态路径优化算法[J]. 吉林大学学报(工学版), 2018, 48(4): 1214-1223.
[12] 黄辉, 冯西安, 魏燕, 许驰, 陈慧灵. 基于增强核极限学习机的专业选择智能系统[J]. 吉林大学学报(工学版), 2018, 48(4): 1224-1230.
[13] 傅文博, 张杰, 陈永乐. 物联网环境下抵抗路由欺骗攻击的网络拓扑发现算法[J]. 吉林大学学报(工学版), 2018, 48(4): 1231-1236.
[14] 曹洁, 苏哲, 李晓旭. 基于Corr-LDA模型的图像标注方法[J]. 吉林大学学报(工学版), 2018, 48(4): 1237-1243.
[15] 侯永宏, 王利伟, 邢家明. 基于HTTP的动态自适应流媒体传输算法[J]. 吉林大学学报(工学版), 2018, 48(4): 1244-1253.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!