吉林大学学报(信息科学版) ›› 2024, Vol. 42 ›› Issue (3): 430-437.

• • 上一篇    下一篇

基于改进 PSO-Means 算法的大数据聚类处理方法 

蒋大锐, 徐胜超    

  1. 广州华商学院 数据科学学院, 广州 511300
  • 收稿日期:2023-07-11 出版日期:2024-06-18 发布日期:2024-06-17
  • 作者简介:蒋大锐(1986— ), 男, 广东徐闻人, 广州华商学院副教授, 博士研究生, 主要从事统计学与数据挖掘研究, ( Tel)86- 15013279798(E-mail)jiangdarui_2023@ 126. com; 徐胜超(1980— ), 男, 武汉人, 广州华商学院副教授, 主要从事并行 分布式处理软件研究, (Tel)86-13824483568(E-mail)isdooropen@ 126. com。
  • 基金资助:
    国家自然科学基金资助项目(61772221); 广州华商学院校内导师制科研基金资助项目(2023HSDS08); 广州华商学院 2023 年创新创业教育专项研究课题基金资助项目(HS2023CXCY04)

Method of Large Data Clustering Processing Based on Improved PSO Means Clustering Algorithm

JIANG Darui, XU Shengchao   

  • Received:2023-07-11 Online:2024-06-18 Published:2024-06-17

摘要: 针对大数据聚类处理存在不同类型数据聚类效果差、 聚类耗时长的问题, 提出了基于改进 PSO-Means (Particle Swarm Optimization Means)算法的大数据聚类处理方法。 该方法采用粒子群算法确定一次聚类过程中单位粒子的飞行时间和飞行方向, 预先设定初始聚类中心的选择范围, 并适当调整单位粒子的惯性权重, 以消除粒子振荡造成的聚类缺陷, 成功获取基于大规模数据的聚类中心。 结合生成树算法, 通过从样本偏差和质心偏度两个方面对 PSO 算法进行优化, 并将优化后的聚类中心输入到 k-means 聚类算法中, 实现大数据聚类处理。 实验结果表明, 改进的 PSO-Means 方法可以有效地聚类不同类型的数据, 并且聚类耗时仅为0. 3 s, 验证了该方法具备较好的聚类性能和聚类效率。

关键词: 大规模数据, 粒子群算法, 寻优, k-means 聚类算法, 数据聚类

Abstract: Big data clustering processing has the problem of poor clustering effect and long clustering time for different types of data. Therefore, a big data clustering processing method based on the improved PSO-Means (Particle Swarm Optimization Means) clustering algorithm is proposed. The particle swarm optimization algorithm is used to determine the flight time and direction of unit particles during a cluster, preset the selection range of the initial cluster center, and appropriately adjust the inertia weight of unit particles. It eliminates the clustering defects caused by particle oscillation and successfully obtains the clustering center based on large-scale data. Combined with the spanning tree algorithm, the PSO algorithm is optimized from two aspects: sample skewness and centroid skewness. The optimized clustering center is then input into the k-means clustering algorithm to realize the clustering processing of big data. The experimental results show that the proposed method can effectively cluster different types of data, and the clustering time is only 0. 3 s, which verifies that the method has good clustering performance and clustering efficiency.

Key words: large scale data, particle swarm optimization, optimization, k-means clustering algorithm, clustering

中图分类号: 

  • TP393. 4