吉林大学学报(工学版) ›› 2021, Vol. 51 ›› Issue (5): 1808-1816.doi: 10.13229/j.cnki.jdxbgxb20200645

• 计算机科学与技术 • 上一篇    下一篇

基于模拟划分的SP⁃k⁃means-+算法

杨勇1,2(),陈强1,曲福恒1,刘俊杰1,张磊1   

  1. 1.长春理工大学 计算机科学技术学院,长春 130022
    2.长春师范大学 教育学院,长春 130032
  • 收稿日期:2020-08-21 出版日期:2021-09-01 发布日期:2021-09-16
  • 作者简介:杨勇(1970-),男,教授,博士.研究方向:图形处理,模式识别,软件工程.E-mail:yy@cust.edu.cn
  • 基金资助:
    吉林省教育厅科研项目(JJKH20181164KJ);国家自然科学基金项目(41671397);吉林省教育科学“十三五”规划项目(GH19086)

SP⁃k⁃means-+ algorithm based on simulated partition

Yong YANG1,2(),Qiang CHEN1,Fu-heng QU1,Jun-jie LIU1,Lei ZHANG1   

  1. 1.College of Computer Science and Technology,Changchun University of Science and Technology,Changchun 130022,China
    2.College of Education,Changchun Normal University,Changchun 130032,China
  • Received:2020-08-21 Online:2021-09-01 Published:2021-09-16

摘要:

I-k-means-+算法作为一种新的k-means目标函数优化算法,通过分裂与删除簇提高解的质量,在一定程度上克服了k-means算法容易陷入局部最优解而导致目标函数优化效果不佳的问题,但该算法采用一种比较粗糙的方式估计各簇的Gain值和Cost值,影响了目标函数优化效果。针对此问题本文提出了一种基于模拟划分的SP-k-means-+算法,根据各簇模拟划分的情况,更准确地计算各簇的Gain值和Cost值,降低了簇对匹配过程中漏检与误判的可能性,在每次迭代中选择更合适的簇对执行分裂删除操作,进一步优化了目标函数并且避免了无效迭代造成的冗余计算问题。实验结果表明:当无需-+操作时,本文算法与I-k-means-+算法的目标函数一致且效率提升了16%;当需要-+操作时,本文算法在不降低计算效率的前提下目标函数优化效果较I-k-means-+算法更佳,聚类模型解的精度提高了10%以上,最高达到47%。

关键词: 人工智能, I-k-means-+, 目标函数优化, 分裂与删除, 模拟划分, 合适的簇对

Abstract:

K-means algorithm is easy to fall into local optimal solution, which leads to poor optimization effect of objective function, especially when k-means is applied to data sets which the number of data elements, data dimension and clusters are large. To solve this problem, iterative k-means minus–plus(I-k-means-+) is proposed in 2018, as a new clustering algorithm of objective function optimization for k-means, which improves iteratively the quality of solution of clustering by removing one cluster (minus), dividing another one (plus), from the generated clustering solution, and updating each cluster with topical k-means. However, I-k-means-+ algorithm roughly estimates the gain value and cost value of each cluster, sacrifices precision for lower computational complexity and affects the optimization of objective function. Focus on this problem, in this paper, there is a modified I-k-means-+ algorithm called SP-k-means-+ which propose to calculate the gain value and cost value of each cluster more accurately, based on simulated partition of every cluster. It reduces the possibility of missing detection and misjudgment in the process of cluster pair matching and avoids the redundant computation caused by invalid iteration in I-k-means-+ algorithm, selects more suitable cluster pairs to divide or delete in each iteration. SP-k-means-+ algorithm optimizes the objective function further, improves partial computing efficiency, and the simulated partition is accelerating by a new heuristic acceleration algorithm. Results of experiments according to real data set show that, given the same initial clustering solution, when there is no -+ process, the efficiency of SP-k-means-+ algorithm is improved by 16%; when there is -+ process, SP-k-means-+ algorithm is improved by 10%—47% in objective function optimization compared with I-k-means-+ algorithm, meanwhile two algorithm has close runtimes.

Key words: artificial intelligence, I-k-means-+, objective function optimization, divide and remove, simulated partition, suitable cluster pairs

中图分类号: 

  • TP391
1 Jain A K. Data clustering: 50 years beyond k-means[J]. Pattern Recognition Letters, 2010, 31(8): 651-666.
2 孙吉贵, 刘杰, 赵连宇. 聚类算法研究[J]. 软件学报, 2008, 19(1): 48-61.
Sun Ji-gui, Liu Jie, Zhao Lian-yu. Clustering algorithms research[J]. Journal of Software, 2008, 19(1): 48-61.
3 徐森, 卢志茂, 顾国昌. 结合k均值和非负矩阵分解集成文本聚类算法[J]. 吉林大学学报: 工学版, 2011, 41(4): 1077-1082.
Xu Sen, Lu Zhi-mao, Gu Guo-chang. Integrating k-means and non-negative matrix factorization to ensemble document clustering[J]. Journal of Jilin University (Engineering and Technology Edition), 2011, 41(4): 1077-1082.
4 李宾, 周旭, 梅芳, 等. 基于k-means和矩阵分解的位置推荐算法[J]. 吉林大学学报: 工学版, 2019, 49(5): 1653-1660.
Li Bin, Zhou Xu, Mei Fang, et al. Location recommendation algorithm based on k-means and matrix factorization[J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(5): 1653-1660.
5 Fränti P, Sieranoja S. How much k-means can be improved by using better initialization and repeats?[J]. Pattern Recognition, 2019, 93: 95-112.
6 谢娟英, 王艳娥. 最小方差优化初始聚类中心的k-means算法[J]. 计算机工程, 2014, 40(8): 205-211, 223.
Xie Juan-ying, Wang Yan-e. k-means algorithm based on minimum deviation initialized clustering centers[J]. Computer Engineering, 2014, 40(8): 205-211, 223.
7 孟子健, 马江洪. 一种可选初始聚类中心的改进k均值算法[J]. 统计与决策, 2014(12): 12-14.
Meng Zi-jian, Ma Jiang-hong. An improved k-means algorithm with optional initial clustering centers[J]. Statistics & Decision, 2014(12): 12-14.
8 Arthur D, Vassilvitskii S. K-means++: the advantages of careful seeding[J]. Proceedings of the Eighteenth Annual ACM-SIAM Symposiumon Discrete algorithms, Society for Industrial and Applied Mathematics, 2007, 11(6): 1027-1035.
9 钟熙, 孙祥娥. 基于k-means++聚类的朴素贝叶斯集成方法研究[J]. 计算机科学, 2019, 46(): 439-441, 451.
Zhong Xi, Sun Xiang-e. Research on naive bayes ensemble method based on k-means++ clustering[J]. Computer Science, 2019, 46(Sup.1): 439-441, 451.
10 黄岚, 李玉, 王贵参, 等. 基于点距离和密度峰值聚类的社区发现方法[J]. 吉林大学学报: 工学版, 2016, 46(6): 2042-2051.
Huang Lan, Li Yu, Wang Gui-shen, et al. Community detection method based on vertex distance and clustering of density peaks[J]. Journal of Jilin University(Engineering and Technology Edition), 2016, 46(6): 2042-2051.
11 唐泽坤, 朱泽宇, 杨裔, 等. 基于距离和密度的D-k-means算法[J]. 计算机应用研究, 2020, 37(6): 1719-1723.
Tang Ze-kun, Zhu Ze-yu, Yang Yi, et al. D-k-means algorithm based on distance and density[J]. Application Research of Computers, 2020, 37(6): 1719-1723.
12 Tzortzis G, Likas A. The MinMax k-means clustering algorithm[J]. Pattern Recognition, 2014, 47(7): 2505-2516.
13 Wang X Y, Bai Y P. A modified minmax k-means algorithm based on PSO[J]. Computational Intelligence & Neuroscience, 2016(7): 27656201.
[1] 赵亚慧,杨飞扬,张振国,崔荣一. 基于强化学习和注意力机制的朝鲜语文本结构发现[J]. 吉林大学学报(工学版), 2021, 51(4): 1387-1395.
[2] 董延华,刘靓葳,赵靖华,李亮,解方喜. 基于BPNN在线学习预测模型的扭矩实时跟踪控制[J]. 吉林大学学报(工学版), 2021, 51(4): 1405-1413.
[3] 刘富,梁艺馨,侯涛,宋阳,康冰,刘云. 模糊c-harmonic均值算法在不平衡数据上改进[J]. 吉林大学学报(工学版), 2021, 51(4): 1447-1453.
[4] 尚福华,曹茂俊,王才志. 基于人工智能技术的局部离群数据挖掘方法[J]. 吉林大学学报(工学版), 2021, 51(2): 692-696.
[5] 赵海英,周伟,侯小刚,张小利. 基于多任务学习的传统服饰图像双层标注[J]. 吉林大学学报(工学版), 2021, 51(1): 293-302.
[6] 欧阳丹彤,马骢,雷景佩,冯莎莎. 知识图谱嵌入中的自适应筛选[J]. 吉林大学学报(工学版), 2020, 50(2): 685-691.
[7] 李贻斌,郭佳旻,张勤. 人体步态识别方法与技术[J]. 吉林大学学报(工学版), 2020, 50(1): 1-18.
[8] 徐谦,李颖,王刚. 基于深度学习的行人和车辆检测[J]. 吉林大学学报(工学版), 2019, 49(5): 1661-1667.
[9] 高万夫,张平,胡亮. 基于已选特征动态变化的非线性特征选择方法[J]. 吉林大学学报(工学版), 2019, 49(4): 1293-1300.
[10] 欧阳丹彤,肖君,叶育鑫. 基于实体对弱约束的远监督关系抽取[J]. 吉林大学学报(工学版), 2019, 49(3): 912-919.
[11] 顾海军, 田雅倩, 崔莹. 基于行为语言的智能交互代理[J]. 吉林大学学报(工学版), 2018, 48(5): 1578-1585.
[12] 董飒, 刘大有, 欧阳若川, 朱允刚, 李丽娜. 引入二阶马尔可夫假设的逻辑回归异质性网络分类方法[J]. 吉林大学学报(工学版), 2018, 48(5): 1571-1577.
[13] 王旭, 欧阳继红, 陈桂芬. 基于垂直维序列动态时间规整方法的图相似度度量[J]. 吉林大学学报(工学版), 2018, 48(4): 1199-1205.
[14] 张浩, 占萌苹, 郭刘香, 李誌, 刘元宁, 张春鹤, 常浩武, 王志强. 基于高通量数据的人体外源性植物miRNA跨界调控建模[J]. 吉林大学学报(工学版), 2018, 48(4): 1206-1213.
[15] 李雄飞, 冯婷婷, 骆实, 张小利. 基于递归神经网络的自动作曲算法[J]. 吉林大学学报(工学版), 2018, 48(3): 866-873.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!