吉林大学学报(信息科学版) ›› 2020, Vol. 38 ›› Issue (5): 568-577.

• • 上一篇    下一篇

基于 DBIRCH 算法的 Argo 剖面数据聚类

邬满1,2, 张万桢3, 孙 苗1, 林 森4   

  1. 1. 广西壮族自治区海洋研究院 信息科, 南宁 530022; 2. 自然资源部 海洋信息技术创新中心, 天津 300171;3. 桂林航天工业学院 实践教学部, 广西 桂林 541004; 4. 桂林电子科技大学 计算机与信息安全学院, 广西 桂林 541004
  • 收稿日期:2020-04-04 出版日期:2020-09-24 发布日期:2020-10-22
  • 通讯作者: 张万桢(1981— ), 女, 武汉人, 桂林航天工业大学讲师, 主要从事数据挖掘、机器学习等研究, ( Tel)86-13677737779(E-mail)wanzer@ guat. edu. cn
  • 作者简介:邬满(1985— ), 男, 湖北黄冈人, 广西壮族自治区海洋研究院高级工程师, 自然资源部海洋信息技术创新中心高级工程师, 主要从事大数据与人工智能、 地理信息系统、 智能交通等研究, ( Tel)86-18776157776 ( E-mail) 45325807@ qq.com
  • 基金资助:
    自然资源部海洋信息技术创新中心 2019 年度开放基金资助项目; 国家自然科学基金资助项目(61763007; 61866007);广西科技重大专项基金资助项目(桂科 AA18118025)

Argo Profile Data Clustering Based on DBIRCH Algorithm

WU Man1,2, ZHANG Wanzhen3, SUN Miao1, LIN Sen4   

  1. 1. Information Department, Guangxi Academy of Oceangraphy, Nanning 530022, China;2. Technology Innovation Center of Marine Information, Ministry of Natural Resources, Tianjin 300171, China;3. Institute of Geography and Oceanography, Guilin University of Aerospace Technology, Guilin 541004, China;4. School of Computer and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
  • Received:2020-04-04 Online:2020-09-24 Published:2020-10-22

摘要: 为解决实时分析处理的海洋 Argo 浮标剖面观测数据特有的数据密度较高、快速响应且需要识别任意形状簇等问题, 提出了一种可通过单次扫描数据集进行有效处理的低复杂度聚类算法 DBIRCH( Density-BasedBalanced Iterative Reducing and Clustering Using Hierarchies)。 该算法通过使用新引入的参数密度阈值修正因子,动态的更新限制 CF(Clustering Feature)树生长的约束系数子空间阈值, 同时结合密度关联思想在不同邻域内多次建立 CF 树且合并, 最终以核心 CF 树子节点为聚类结果输出, 避免了 BIRCH(Balanced Iterative Reducing and Clustering Using Hierarchies)算法对参数的过度依赖, 同时因能处理任意形状簇从而提升了数据处理的整体鲁棒性, 提高了处理 Argo 剖面监测数据的时效性和算法的整体吞吐速度。 为测试算法的综合性能, 使用真实 Argo浮标剖面实时监测数据集, 并根据不同的参数对算法做出多组对比实验, 同时使用不同评价指标对算法从运行时间和聚类准确率上进行综合评估, 从全局角度分析该算法在 DBSCAN(Density-Based Spatial Clustering of Applications with Noise)、 BIRCH 及 DBIRCH 3 种不同算法中综合聚类性能最优。 实验结果表明, 在3 种算法中,BIRCH 算法运算速度最快, 但准确率最低; DBSCAN 算法聚类性能高于 BIRCH 算法, 但运算速度最慢; 改进的DBIRCH 算法运算效率略低于 BIRCH 算法, 但聚类准确率最高。

关键词: Argo 浮标, 聚类分析, BIRCH 算法, DBSCAN 算法, DBIRCH 算法

Abstract: In order to solve the problems of high data density, short response time and the need to identify clusters of arbitrary shapes which are unique to the observed data of marine Argo buoy profiles that need real-time analysis and processing, this paper proposes a low-complexity clustering method which can effectively process data set in a single scan. The algorithm DBIRCH(Density-Based Balanced Iterative Reducing and Clustering Using Hierarchies), by using the new parameter density threshold correction factor, dynamically updates the constraint coefficient subspace threshold which restricts the growth of CF(Clustering Feature) tree. It combines the idea of density correlation to establish CF tree in different neighborhoods and merges several times. Finally,the core CF tree sub-nodes are used as the output of clustering results, which avoids the excessive dependence ofBIRCH (Balanced Iterative Reducing and Clustering Using Hierarchies) algorithm on parameters and can deal with arbitrary shape clusters. It improves the robustness of data processing, the timeliness of processing Argo profile monitoring data and the overall throughput speed of the algorithm. In order to measure the comprehensive performance of the algorithm, real-time monitoring data sets of Argo buoy profiles are used to make multi-group comparative experiments on the algorithm according to different parameters. And different evaluation indexes are used to evaluate the algorithm comprehensively in terms of running time and clustering accuracy. From a global point of view, the algorithm has the best comprehensive clustering performance among three different algorithms:DBSCAN ( Density-Based Spatial Clustering of Applications with Noise ), BIRCH and DBIRCH. The experimental results show that among the three algorithms, birch algorithm has the fastest operation speed, but the lowest accuracy; DBSCAN algorithm has better clustering performance than birch algorithm, but the operation speed is the slowest; the improved dbirch algorithm is slightly lower than birch algorithm, but the clustering accuracy is the highest.

Key words: argo buoy, cluster analysis, balanced iterative reducing and clustering using hierarchies (BIRCH)algorithm, density-based spatial clustering of applications with noise ( DBSCAN) algorithm;density-based balanced iterative reducing and clustering using hierarchies (DBIRCH) algorithm

中图分类号: 

  • TP312