Journal of Jilin University (Information Science Edition) ›› 2020, Vol. 38 ›› Issue (5): 568-577.

Previous Articles     Next Articles

Argo Profile Data Clustering Based on DBIRCH Algorithm

WU Man1,2, ZHANG Wanzhen3, SUN Miao1, LIN Sen4   

  1. 1. Information Department, Guangxi Academy of Oceangraphy, Nanning 530022, China;2. Technology Innovation Center of Marine Information, Ministry of Natural Resources, Tianjin 300171, China;3. Institute of Geography and Oceanography, Guilin University of Aerospace Technology, Guilin 541004, China;4. School of Computer and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
  • Received:2020-04-04 Online:2020-09-24 Published:2020-10-22

Abstract: In order to solve the problems of high data density, short response time and the need to identify clusters of arbitrary shapes which are unique to the observed data of marine Argo buoy profiles that need real-time analysis and processing, this paper proposes a low-complexity clustering method which can effectively process data set in a single scan. The algorithm DBIRCH(Density-Based Balanced Iterative Reducing and Clustering Using Hierarchies), by using the new parameter density threshold correction factor, dynamically updates the constraint coefficient subspace threshold which restricts the growth of CF(Clustering Feature) tree. It combines the idea of density correlation to establish CF tree in different neighborhoods and merges several times. Finally,the core CF tree sub-nodes are used as the output of clustering results, which avoids the excessive dependence ofBIRCH (Balanced Iterative Reducing and Clustering Using Hierarchies) algorithm on parameters and can deal with arbitrary shape clusters. It improves the robustness of data processing, the timeliness of processing Argo profile monitoring data and the overall throughput speed of the algorithm. In order to measure the comprehensive performance of the algorithm, real-time monitoring data sets of Argo buoy profiles are used to make multi-group comparative experiments on the algorithm according to different parameters. And different evaluation indexes are used to evaluate the algorithm comprehensively in terms of running time and clustering accuracy. From a global point of view, the algorithm has the best comprehensive clustering performance among three different algorithms:DBSCAN ( Density-Based Spatial Clustering of Applications with Noise ), BIRCH and DBIRCH. The experimental results show that among the three algorithms, birch algorithm has the fastest operation speed, but the lowest accuracy; DBSCAN algorithm has better clustering performance than birch algorithm, but the operation speed is the slowest; the improved dbirch algorithm is slightly lower than birch algorithm, but the clustering accuracy is the highest.

Key words: argo buoy, cluster analysis, balanced iterative reducing and clustering using hierarchies (BIRCH)algorithm, density-based spatial clustering of applications with noise ( DBSCAN) algorithm;density-based balanced iterative reducing and clustering using hierarchies (DBIRCH) algorithm

CLC Number: 

  • TP312