Data Imputation using Correlation-based Machine Learning Algorithms

B Aruna Devi, N Karthik - … Conference on Intelligent Systems Design and …, 2023 - Springer
International Conference on Intelligent Systems Design and Applications, 2023Springer
Diabetes mellitus is a condition that impacts the body's usage of blood sugar. If diabetes is
not properly managed or detected in a timely manner, it can be fatal. It may also harm vital
organs like the kidneys, heart, blood vessels, nerves, and eyes. In medical research, missing
data is one of the main problem acknowledged by statisticians and medical researchers. If
the effect of missing data is ignored, the result will usually be skewed. Data collected in real-
world applications often contains missing values in one or more variables due to machine …
Abstract
Diabetes mellitus is a condition that impacts the body’s usage of blood sugar. If diabetes is not properly managed or detected in a timely manner, it can be fatal. It may also harm vital organs like the kidneys, heart, blood vessels, nerves, and eyes. In medical research, missing data is one of the main problem acknowledged by statisticians and medical researchers. If the effect of missing data is ignored, the result will usually be skewed. Data collected in real-world applications often contains missing values in one or more variables due to machine and human errors. Missing value problems must be handled properly because incomplete datasets are less valuable for data analysis. Reducing the dataset to a subset with all the data is one way to deal with the missing data. Significant data loss could result from this process if the dataset is small. In medical datasets, correlation is important because unique features are often different symptoms of the same medical condition, implying a strong dependency. Hence, this work aims to investigate five imputation strategies (Mean, K Nearest Neighbor (KNN), Predictive Mean Matching (PMM), Random Forest (RF), Classification and Regression Tree (CART)) based on attribute correlation for missing value estimation in the Pima Indian diabetes dataset. Root-mean-square error (RMSE) and Mean Absolute Error (MAE) values were used to assess the efficacy of imputation methods. The findings demonstrate that, for some features, attribute correlation-based imputation performs minimally better than attribute correlation-free imputation.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果