查看文章

researchgate.net 中的 [PDF]

Machine learning with oversampling and undersampling techniques: overview study and experimental results

作者

Roweida Mohammed, Jumanah Rawashdeh, Malak Abdullah

发表日期

2020/4/7

来源

2020 11th international conference on information and communication systems (ICICS)

页码范围

243-248

出版商

IEEE

简介

Data imbalance in Machine Learning refers to an unequal distribution of classes within a dataset. This issue is encountered mostly in classification tasks in which the distribution of classes or labels in a given dataset is not uniform. The straightforward method to solve this problem is the resampling method by adding records to the minority class or deleting ones from the majority class. In this paper, we have experimented with the two resampling widely adopted techniques: oversampling and undersampling. In order to explore both techniques, we have chosen a public imbalanced dataset from kaggle website Santander Customer Transaction Prediction and have applied a group of well-known machine learning algorithms with different hyperparamters that give best results for both resampling techniques. One of the key findings of this paper is noticing that oversampling performs better than undersampling for different …

引用总数

被引用次数：555

202020212022202320242 31 148 237 133

学术搜索中的文章

Machine learning with oversampling and undersampling techniques: overview study and experimental results

R Mohammed, J Rawashdeh, M Abdullah - 2020 11th international conference on information and …, 2020

被引用次数：555 相关文章所有 4 个版本