一种高效的阴阳k-Means聚类算法

吉林大学学报(理学版) ›› 2021, Vol. 59 ›› Issue (6): 1455-1460.

一种高效的阴阳k-Means聚类算法

李长明¹, 张红臣¹, 王超², 李晓光², 陆洋³, 钱超越³

1. 长春光华学院工程技术研发中心, 长春 130033; 2. 长春光华学院电气信息学院, 长春 130033;
3. 长春理工大学计算机科学技术学院, 长春 130022

收稿日期:2020-12-08 出版日期:2021-11-26 发布日期:2021-11-26
通讯作者: 张红臣 E-mail:390088762@qq.com

An Efficient Yinyang k-Means Clustering Algorithm

LI Changming¹, ZHANG Hongchen¹, WANG Chao², LI Xiaoguang², LU Yang³, QIAN Chaoyue³

1. Engineering Technology Research and Development Center, Changchun Guanghua University, Changchun 130033, China;
2. School of Electrical Information, Changchun Guanghua University, Changchun 130033, China;
3. School of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China

Received:2020-12-08 Online:2021-11-26 Published:2021-11-26

摘要/Abstract

摘要： 针对传统阴阳k-means算法未利用数据结构导致计算效率较低的问题, 提出一种高效阴阳k-means聚类算法. 该算法根据数据相似性将原始数据进行逐层分解, 并建立满m叉树结构存储各层数据, 以树结构各叶子节点中存储的数据信息建立加权数据, 运行加权阴阳k-means算法得到收敛中心. 在原始数据中以加权数据收敛中心为初始化条件运行传统阴阳k-means算法进一步优化目标函数值. 在5组UCI数据集上与k-means、传统阴阳k-means及另外两种加速算法进行对比实验, 实验结果表明, 该算法具有较高的加速比, 且求解精度与传统阴阳k-means聚类基本相同.

关键词: 聚类分析, 阴阳k-means算法, k-means算法, 数据加权

Abstract: Aiming at the problem that the traditional Yinyang algorithm did not use the data structure, resulting in low computational efficiency, we proposed an efficient Yinyang k-means clustering algorithm. The algorithm decomposed the original data layer by layer according to the data similarity, and established a full m-tree structure to store the data of each layer. The weighted data was established based on the data information stored in each leaf node of the tree structure, and the weighted Yinyang k-means algorithm was run to obtain the convergence center. In the original data, the convergence centers of the weighted data were taken as the initial condition to run the traditional Yinyang k-means algorithm to further optimize the objective function value. Comparative experiments with k-means, traditional Yinyang k-means and other two acceleration algorithms on five UCI data sets show that the proposed algorithm has a high acceleration ratio, and the solution accuracy is basically equivalent to Yinyang k-means clustering.

Key words: cluster analysis, Yinyang k-means algorithm, k-means , algorithm, data weighting

中图分类号:

TP391

李长明, 张红臣, 王超, 李晓光, 陆洋, 钱超越. 一种高效的阴阳k-Means聚类算法[J]. 吉林大学学报(理学版), 2021, 59(6): 1455-1460.

LI Changming, ZHANG Hongchen, WANG Chao, LI Xiaoguang, LU Yang, QIAN Chaoyue. An Efficient Yinyang k-Means Clustering Algorithm[J]. Journal of Jilin University Science Edition, 2021, 59(6): 1455-1460.

[1]	齐向明, 孙煦骄. 基于语义簇的中文文本聚类算法[J]. 吉林大学学报(理学版), 2019, 57(5): 1193-1199.
[2]	朱超平, 任继平. 基于智能优化算法的物联网异构数据融合方法[J]. 吉林大学学报(理学版), 2019, 57(3): 627-632.
[3]	王海燕, 崔文超, 许佩迪, 李闯. 一种局部概率引导的优化K-means++算法[J]. 吉林大学学报(理学版), 2019, 57(06): 1431-1436.
[4]	李鸿雁, 唐娴. 聚类分析和活动轮廓模型相融合的图像分割算法[J]. 吉林大学学报(理学版), 2019, 57(04): 896-902.
[5]	姜建华, 吴迪, 郝德浩, 王丽敏, 张永刚, 李克勤. 基于CDbw和人工蜂群优化的密度峰值聚类算法[J]. 吉林大学学报(理学版), 2018, 56(6): 1469-1475.
[6]	夏雪飞, 韩啸, 兰天姝, 王礼华, 吴佳楠, 周柚. 基于灰值区间的微阵列模拟数据生成算法[J]. 吉林大学学报(理学版), 2016, 54(06): 1401-1404.
[7]	姜建华, 杨玉免, 边海燕, 康嘉容，王丽敏，刘颖. 改进DBSCAN聚类算法在电子商务网站评价中的应用[J]. 吉林大学学报(理学版), 2016, 54(02): 329-336.
[8]	高璐, 李文辉, 王莹. 基于模糊聚类分析和多特征融合的人脸识别方法[J]. J4, 2012, 50(02): 293-298.
[9]	曲福恒, 胡雅婷, 马驷良, 苑丽红, 孙爽滋. 基于核的模糊c均值聚类算法的收敛性定理[J]. J4, 2011, 49(06): 1079-1086.
[10]	梁悦, 魏世刚, 茹鑫, 于爱民, 师宇华. 用气相色谱法研究逍遥丸（水丸）的指纹图谱及其分层聚类分析[J]. J4, 2010, 07(4): 683-688.
[11]	曲福恒, 马驷良, 胡雅婷. 一种基于核的模糊聚类算法[J]. J4, 2008, 46(06): 1137-1141.
[12]	张利彪, 周春光, 李春霞, 孙彩堂, 刘小华, 刘淼. 基于C-均值聚类的二层次人像聚类算法[J]. J4, 2006, 44(06): 37-40.