吉林大学学报(信息科学版) ›› 2026, Vol. 44 ›› Issue (3): 656-662.

• • 上一篇    下一篇

气象业务高维相关性缺失数据分块填补算法

刘兴丽1, 高 月2, 白玉兰3, 孙 源4, 刘长成4   

  1. 1. 黑龙江省气象数据中心,哈尔滨150030;2. 哈尔滨市阿城区气象局,哈尔滨150399; 3. 鹤岗市气象局,黑龙江鹤岗154106;4. 齐齐哈尔市气象局,黑龙江 齐齐哈尔161006
  • 收稿日期:2025-06-25 出版日期:2026-06-02 发布日期:2026-06-02
  • 通讯作者: 高月(1993— ), 女, 黑龙江齐齐哈尔人, 哈尔滨市阿城区气象 局工程师,主要从事综合气象观测研究,(Tel)86-13936098121(E-mail)13936394272@163. com。 E-mail:13936394272@163. com
  • 作者简介:刘兴丽(1978— ),女,辽宁葫芦岛人,黑龙江省气象数据中心高级工程师,主要从事气象信息技术、气象数据处理研究, (Tel)18745736121(E-mail)6299136@ qq. com
  • 基金资助:
    黑龙江省气象局创新发展基金资助项目(HQ2023020) 

High-Dimensional Correlation-Based Incomplete Data Block Filling Algorithm for Meteorological Operations

LIU Xingli1, GAO Yue2, BAI Yulan3, SUN Yuan4, LIU Changcheng4   

  1. 1. Heilongjiang Provincial Meteorological Data Center, Harbin 150030, China; 2. Harbin Acheng District Meteorological Bureau, Harbin 150399, China; 3. Hegang Meteorological Bureau, Hegang 154106, China; 4. Qiqihar Meteorological Bureau, Qiqihar 161006, China
  • Received:2025-06-25 Online:2026-06-02 Published:2026-06-02

摘要: 由于气象业务数据包含时间、空间和多变量维度而维度升高会使数据稀疏性增加并且不同时间/空间尺度下呈现不同的相关性模式难以映射气象要素与数据间的相关性关系,导致填补结果的结构相似性较差。为此, 提出气象业务高维相关性缺失数据分块填补算法。结合互信息算法, 基于核密度估计(KDE: Kernel Density Estimation)近似连续型气象变量的概率分布, 通过互信息公式量化变量间的非线性统计依赖性, 生成对称互信息矩阵捕捉气象要素的局部相关性。将归一化互信息矩阵转化为相似度矩阵通过指数函数映射强化 强相关性、弱化弱相关性。构建拉普拉斯矩阵计算其特征向量并利用 k-means 算法对特征向量进行聚类, 实现属性分块。通过分块处理的方式将气象数据划分为强相关子块并为每个子块设计独立的条件生成对抗网络(CGAN: Conditional Generative Adversarial Network)。通过对损失函数进行设计, 并对条件生成对抗网络进行训练进而使模型生成与真实气象数据分布一致的填补值。实验结果表明采用该方法进行缺失数据分块填补时填补结果的结构相似性稳定在0.92, 表明该方法具有理想的填补效果。

关键词: 气象业务, 高维, 相关性, 缺失数据, 填补方法, 条件生成对抗网络

Abstract: Meteorological business data contain temporal, spatial and multivariable dimensions. The increase in dimensions leads to an increase in data sparsity, presenting different correlation patterns at different temporal/ spatial scales. It is difficult to map the correlation between meteorological elements and data, resulting in poor structural similarity of the filling results. Therefore, a high-dimensional correlation deficiency data block-filling algorithm for meteorological services is proposed. Combined with the mutual information algorithm, the probability distribution of continuous meteorological variables is approximated based on KDE(Kernel Density Estimation). The nonlinear statistical dependence between variables is quantified through the mutual information formula to generate a symmetric mutual information matrix and capture the local correlation of meteorological elements. The normalized mutual information matrix is transformed into a similarity matrix, and the strong correlation is strengthened and the weak correlation is weakened through exponential function mapping. The Laplacian matrix is constructed, its eigenvectors are calculated, and the k-means algorithm is used to cluster the eigenvectors to achieve attribute blocking. The meteorological data is divided into strongly correlated sub-blocks through block processing, and an independent CGAN(Conditional Generative Adversarial Network) is designed for each sub-block. By designing the loss function and training the conditional generative adversarial network, the model can generate imputed values consistent with the distribution of real meteorological data. The experimental results show that when the proposed method is used for block filling of missing data, the structural similarity of the filling results is stable at 0. 92, indicating that this method has an ideal filling effect.

Key words: meteorological operations, high-dimensional, correlation, missing data, filling method, conditional generative adversarial network(CGAN)

中图分类号: 

  • TP311