基于缺失森林的医疗大数据缺失值插补

Abstract

Abstract: To address the adverse effects of missing data in the medical dataset on the performance of the classifier and on downstream tasks. We use the missing forest interpolation method to interpolate missing values in medical datasets. The method first trains a random forest model with observations of complete data in the dataset. Then the trained random forest model is used to predict the missing data. Finally, the above process is repeated iteratively to complete the missing data interpolation. On two medical datasets, according to NRMSE(Normalized Root Mean Squared Error) and PFC( the Proportion of Falsely Classified) evaluation metrics, the missing forest interpolation method has lower error and better interpolation than K-nearest neighbor interpolation,multiple interpolation and GAIN( Generative Adversarial Imputation Nets) interpolation. The stability of the missing forest interpolation method is demonstrated by analyzing the relationship between glutamate aminotransferase (ALT: ALanine aminoTransferase) and diabetes dose-response using the diabetes dataset.

Key words: missing data interpolation； , missing forest interpolation； , big data； , alanine amino transferase(ALT)and diabetes dose-response

CLC Number:

TP391

BAI Hongtao, LUAN Xue, HE Lili , BI Yaru, ZHANG Tingting, SUN Chenglin. Missing Value Interpolation for Medical Big Data Based on Missing Forest[J].Journal of Jilin University (Information Science Edition), 2022, 40(4): 616-620.

References

Metrics

Viewed

Full text

424

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	0	0	0	424

From	Others	local

Times	75	349
Rate	18%	82%

Abstract

580

Just accepted	Online first	Issue

0	0	580

From	Others	local

Times	579	1
Rate	100%	0%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

[1]	LI Xuegui, GAO Ming, WU Runtong , WANG Ruyi , ZI Qianlong, JIAN Zhen, LI Wensen, ZHOU Yingjie. Denoising Method of Microseismic Signal Based on Particle Filter [J]. Journal of Jilin University (Information Science Edition), 2022, 40(5): 701-709.
[2]	CAI Bao, XU Lichen, TAN Zhongqiang, LOU Huotao. Development of Virtual Simulation Software for Thermal Analysis of Reducer [J]. Journal of Jilin University (Information Science Edition), 2022, 40(5): 812-818.
[3]	YANG Huamin a , YU Zhi , DI Xiaoqiang , LIANG Zhongyu , ZHANG Xingxu . Prediction of College Students’ Mental Health Status Based on Students’ Behavior Data [J]. Journal of Jilin University (Information Science Edition), 2022, 40(5): 819-828.
[4]	MARDAN Zunon. Digital Multimedia Information Encryption Algorithm Based on Big Data Analysis #br# [J]. Journal of Jilin University (Information Science Edition), 2022, 40(5): 829-835.
[5]	LI Liang , LU Zheng , ZHAO Jinghua , SUN Hongyu , LIU Jingwei . License Plate Recognition Based on Light Streak Deblurring in Dim Light Environment [J]. Journal of Jilin University (Information Science Edition), 2022, 40(5): 836-845.
[6]	ZHAO Jie, GUO Dong. Adversarial Examples Defense Method Base on Parallel Attention Mechanism [J]. Journal of Jilin University (Information Science Edition), 2022, 40(5): 846-855.
[7]	WAN Yunxia , WANG Yong , CAO Haojie a , TAO Hangyu , HAN Hongshuang . Laboratory Equipment Safety and Personnel Information Management System [J]. Journal of Jilin University (Information Science Edition), 2022, 40(5): 868-874.
[8]	ZHANG Aoke, QIAN Yuhang, QI Hong. 3D Reconstruction of Porous Materials Based on WGAN [J]. Journal of Jilin University (Information Science Edition), 2022, 40(5): 854-892.
[9]	FU Guangjie, SUN Chaoyang. Simulation of Speed Sensorless Vector Control System for Induction Motor [J]. Journal of Jilin University (Information Science Edition), 2022, 40(4): 559-566.
[10]	YANG Liyun, YAN Yuanhai. Hybrid Recommendation Algorithm Based on Tags and Attributes [J]. Journal of Jilin University (Information Science Edition), 2022, 40(4): 644-651.
[11]	LI Shuo, LIU Hejia, LIU Donglai, LI Yang. Bayesian Hierarchical Model for Evaluation Index of Teaching Quality in Higher Education [J]. Journal of Jilin University (Information Science Edition), 2022, 40(4): 657-662.
[12]	CHEN Xuesong, WU Xiaokai. Improved SIFT Algorithm for Image Matching [J]. Journal of Jilin University (Information Science Edition), 2022, 40(4): 672-676.
[13]	ZHAO Shen. Color Matching System for Movie and TV Animation Scenes Based on Feature Point Extraction [J]. Journal of Jilin University (Information Science Edition), 2022, 40(4): 688-693.
[14]	CAI Di , LU Yang , LIN Liyuan , DU Jiaojiao , GUAN Chuang . Early Disease Identification of Rice Blast Based on Sparse Automatic Encoder and SPSO-SVM [J]. Journal of Jilin University (Information Science Edition), 2022, 40(3): 416-423.
[15]	WAN Li , LI Zhenjiang , CHEN Guangyong , CAO Qian. Intelligent Scheduling Method of Tunnel Cleaning Robot [J]. Journal of Jilin University (Information Science Edition), 2022, 40(3): 431-436.

Missing Value Interpolation for Medical Big Data Based on Missing Forest

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0