作者
Tebogo Makaba, Eustace Dogo
发表日期
2019/11/21
研讨会论文
2019 International multidisciplinary information technology and engineering conference (IMITEC)
页码范围
1-7
出版商
IEEE
简介
Dealing with missing values in data is an important feature engineering task in data science to prevent negative impacts on machine learning classification models in terms of accurate prediction. However, it is often unclear what the underlying cause of the missing values in real-life data is or rather the missing data mechanism that is causing the missingness. Thus, it becomes necessary to evaluate several missing data approaches for a given dataset. In this paper, we perform a comparative study of several approaches for handling missing values in data, namely listwise deletion, mean, mode, k-nearest neighbors, expectation-maximization, and multiple imputations by chained equations. The comparison is performed on two real-world datasets, using the following evaluation metrics: Accuracy, root mean squared error, receiver operating characteristics, and the F1 score. Most classifiers performed well across the …
引用总数
20202021202220232024337102
学术搜索中的文章