Performance Analysis of Resampling and Ensemble Learning Methods on Diabetes Detection as Imbalanced Dataset

FIE Sari, FW Edlim, FA Ramadhan… - 2022 Fifth …, 2022 - ieeexplore.ieee.org
FIE Sari, FW Edlim, FA Ramadhan, DA Navastara
2022 Fifth International Conference on Vocational Education and …, 2022ieeexplore.ieee.org
Early detection of diabetes is essential to reducing a high mortality rate. Early detection can
be made by studying the possibility of diabetes from the variables obtained in the data of
diabetes patients. How to diagnose a patient with medical data becomes a challenge
because these are usually imbalanced, where negative cases severely outnumber positive
cases. For preprocessing the imbalanced data, this paper designs an algorithm using
resampling techniques combined with an ensemble learning algorithm. There are some …
Early detection of diabetes is essential to reducing a high mortality rate. Early detection can be made by studying the possibility of diabetes from the variables obtained in the data of diabetes patients. How to diagnose a patient with medical data becomes a challenge because these are usually imbalanced, where negative cases severely outnumber positive cases. For preprocessing the imbalanced data, this paper designs an algorithm using resampling techniques combined with an ensemble learning algorithm. There are some oversampling techniques ADASYN, ROS, and SMOTE. Whereas, the undersampling techniques are RUS, Tomek, and ENN. The combined techniques like SMOTE-ENN and SMOTE-Tomek are also used to handle highly imbalanced dataset diabetes. Then, the ensemble learning algorithm that is used is Random Forest, Bagging, AdaBoost, and XGBoost. Based on the experimental results, the best performance is using SMOTE-ENN with AdaBoost, with a recall score of 0.7330 even though the F1-Score of this model is 0.6459. AdaBoost Classifier also has good and stable results with various types of resampling. By using SMOTE-ENN, the recall score of the model increased by 0.1819 and the F1 score decreased by 0.2000 from the original model result. The higher sensitivity/recall is more important in medical diagnoses to correctly identify patients with disease than the F1 Score.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果