Journal of Jilin University (Information Science Edition) ›› 2024, Vol. 42 ›› Issue (3): 430-437.

Previous Articles     Next Articles

Method of Large Data Clustering Processing Based on Improved PSO Means Clustering Algorithm

JIANG Darui, XU Shengchao   

  • Received:2023-07-11 Online:2024-06-18 Published:2024-06-17

Abstract: Big data clustering processing has the problem of poor clustering effect and long clustering time for different types of data. Therefore, a big data clustering processing method based on the improved PSO-Means (Particle Swarm Optimization Means) clustering algorithm is proposed. The particle swarm optimization algorithm is used to determine the flight time and direction of unit particles during a cluster, preset the selection range of the initial cluster center, and appropriately adjust the inertia weight of unit particles. It eliminates the clustering defects caused by particle oscillation and successfully obtains the clustering center based on large-scale data. Combined with the spanning tree algorithm, the PSO algorithm is optimized from two aspects: sample skewness and centroid skewness. The optimized clustering center is then input into the k-means clustering algorithm to realize the clustering processing of big data. The experimental results show that the proposed method can effectively cluster different types of data, and the clustering time is only 0. 3 s, which verifies that the method has good clustering performance and clustering efficiency.

Key words: large scale data, particle swarm optimization, optimization, k-means clustering algorithm, clustering

CLC Number: 

  • TP393. 4