吉林大学学报(理学版) ›› 2021, Vol. 59 ›› Issue (6): 1455-1460.

• • 上一篇    下一篇

一种高效的阴阳k-Means聚类算法

李长明1, 张红臣1, 王超2, 李晓光2, 陆洋3, 钱超越3   

  1. 1. 长春光华学院 工程技术研发中心, 长春 130033; 2. 长春光华学院 电气信息学院, 长春 130033;
    3. 长春理工大学 计算机科学技术学院, 长春 130022
  • 收稿日期:2020-12-08 出版日期:2021-11-26 发布日期:2021-11-26
  • 通讯作者: 张红臣 E-mail:390088762@qq.com

An Efficient Yinyang k-Means Clustering Algorithm

LI Changming1, ZHANG Hongchen1, WANG Chao2, LI Xiaoguang2, LU Yang3, QIAN Chaoyue3   

  1. 1. Engineering Technology Research and Development Center, Changchun Guanghua University, Changchun 130033, China;
    2. School of Electrical Information, Changchun Guanghua University, Changchun 130033, China;  
    3. School of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China
  • Received:2020-12-08 Online:2021-11-26 Published:2021-11-26

摘要: 针对传统阴阳k-means算法未利用数据结构导致计算效率较低的问题, 提出一种高效阴阳k-means聚类算法. 该算法根据数据相似性将原始数据进行逐层分解, 并建立满m叉树结构存储各层数据, 以树结构各叶子节点中存储的数据信息建立加权数据, 运行加权阴阳k-means算法得到收敛中心. 在原始数据中以加权数据收敛中心为初始化条件运行传统阴阳k-means算法进一步优化目标函数值. 在5组UCI数据集上与k-means、传统阴阳k-means及另外两种加速算法进行对比实验, 实验结果表明, 该算法具有较高的加速比, 且求解精度与传统阴阳k-means聚类基本相同.

关键词: 聚类分析, 阴阳k-means算法, k-means算法, 数据加权

Abstract: Aiming at the problem that the traditional Yinyang algorithm did not use the data structure, resulting in low computational efficiency, we proposed an efficient Yinyang k-means clustering algorithm. The algorithm decomposed the original data layer by layer according to the data similarity, and established a full m-tree structure to store the data of each layer. The weighted data was established based on the data information stored in each leaf node of the tree structure, and the weighted Yinyang k-means algorithm was run to obtain the convergence center. In the original data, the convergence centers of the weighted data were taken as the initial condition to run the traditional Yinyang k-means algorithm to further optimize the objective function value. Comparative experiments with k-means, traditional Yinyang k-means and other two acceleration algorithms on five UCI data sets show that the proposed algorithm has a high acceleration ratio, and the solution accuracy is basically equivalent to Yinyang k-means clustering.

Key words: cluster analysis, Yinyang k-means algorithm, k-means , algorithm, data weighting

中图分类号: 

  • TP391