Journal of Jilin University Science Edition ›› 2025, Vol. 63 ›› Issue (4): 1137-1142.

Previous Articles     Next Articles

Missing Data Filling Method in Time Series Based on Generalized Center Clustering

YU Yanpeng, HUI Xianghui   

  1. College of Information and Management Science (College of Software), Henan Agricultural University, Zhengzhou 450046, China
  • Received:2024-06-04 Online:2025-07-26 Published:2025-07-26

Abstract: Aiming at the problem that the filling of missing values in time series usually relied on the predictions of existing data, and  the complexity and uncertainty of time series often led to errors in the prediction results. In order to ensure the effectiveness of data filling, we proposed a time series missing data filling method based on generalized center clustering. Firstly, we calculated the distance between objects and classes, as well as between classes, quantified the relative positional relationship between data points and cluster centers, and obtained the spatial relationship between data. Secondly, we used information bottleneck algorithms to cluster the generalization centers in space, dividing time series datasets containing missing data into the same class. Finally, we calculated the cluster radius, divided the outlier data generated by the generalized center clustering into usable and weakly usable randomly damaged data, set a fluctuation threshold, and compared the randomly damaged data within the fluctuation threshold with a string of the unified attribute values in the cluster, achieving the  missing data filling in the time series. The experimental results show that this method has high standardized mutual information and hit rate in the clustering process,  and  can ensure a data replenishment rate of over 80% when filling in missing data, indicating that this method can effectively improve the integrity of time series data.

Key words: generalized center clustering, time series, missing data filling, information bottleneck, randomly damaged data, replenishment rate

CLC Number: 

  • TP391