作者
Tabea Kossen, Michelle Livne, Vince I Madai, Ivana Galinovic, Dietmar Frey, Jochen B Fiebach
发表日期
2019/1/1
期刊
bioRxiv
页码范围
773762
出版商
Cold Spring Harbor Laboratory
简介
Background and purpose
Handling missing values is a prevalent challenge in the analysis of clinical data. The rise of data-driven models demands an efficient use of the available data. Methods to impute missing values are thus crucial. Here, we developed a publicly available framework to test different imputation methods and compared their impact in a typical stroke clinical dataset as a use case.
Methods
A clinical dataset based on the 1000Plus stroke study with 380 completed-entries patients was used. 13 common clinical parameters including numerical and categorical values were selected. Missing values in a missing-at-random (MAR) and missing-completely-at-random (MCAR) fashion from 0% to 60% were simulated and consequently imputed using the mean, hot-deck, multiple imputation by chained equations, expectation maximization method and listwise deletion. The performance was assessed by the root mean squared error, the absolute bias and the performance of a linear model for discharge mRS prediction.
Results
Listwise deletion was the worst performing method and started to be significantly worse than any imputation method from 2% (MAR) and 3% (MCAR) missing values on. The underlying missing value mechanism seemed to have a crucial influence on the identified best performing imputation method. Consequently no single imputation method outperformed all others. A significant performance drop of the linear model started from 11% (MAR+MCAR) and 18% (MCAR) missing values.
Conclusions
In the presented case study of a typical clinical stroke dataset we confirmed that listwise deletion should be avoided for …
引用总数
学术搜索中的文章