Journal of Jilin University (Information Science Edition) ›› 2022, Vol. 40 ›› Issue (4): 616-620.

Previous Articles     Next Articles

Missing Value Interpolation for Medical Big Data Based on Missing Forest

BAI Hongtao a,b , LUAN Xue a , HE Lili a,b , BI Yaru c , ZHANG Tingting b , SUN Chenglin c   

  1. a. College of Software; b. College of Computer Science and Technology, Jilin University, Changchun 130022, China; c. First Hospital, Jilin University, Changchun 130012, China
  • Received:2022-03-29 Online:2022-08-16 Published:2022-08-17
  • Supported by:

Abstract: To address the adverse effects of missing data in the medical dataset on the performance of the classifier and on downstream tasks. We use the missing forest interpolation method to interpolate missing values in medical datasets. The method first trains a random forest model with observations of complete data in the dataset. Then the trained random forest model is used to predict the missing data. Finally, the above process is repeated iteratively to complete the missing data interpolation. On two medical datasets, according to NRMSE(Normalized Root Mean Squared Error) and PFC( the Proportion of Falsely Classified) evaluation metrics, the missing forest interpolation method has lower error and better interpolation than K-nearest neighbor interpolation,multiple interpolation and GAIN( Generative Adversarial Imputation Nets) interpolation. The stability of the missing forest interpolation method is demonstrated by analyzing the relationship between glutamate aminotransferase (ALT: ALanine aminoTransferase) and diabetes dose-response using the diabetes dataset.

Key words: missing data interpolation; , missing forest interpolation; , big data; , alanine amino transferase(ALT)and diabetes dose-response

CLC Number: 

  • TP391