作者
Haseeb Ali, Mohd Najib Mohd Salleh, Kashif Hussain, Arshad Ahmad, Ayaz Ullah, Arshad Muhammad, Rashid Naseem, Muzammil Khan
发表日期
2019/1
来源
International Journal of Engineering & Technology
卷号
8
期号
3
页码范围
390-397
简介
Data mining methods are often impaired by datasets with desperate nature. Such real-world datasets contain imbalanced data distri-butions among classes, which affects the learning process negatively. In this scenario, the number of samples pertaining to one class (majority class) surpasses adequately the number of samples of other class (minority class)–resulting in ignorance of the minority class by classification methods. To address this, various useful approaches related to data preprocessing are considered mandatory for developing an effective model by using contemporary data mining algorithms. Oversampling and undersampling are two of the fundamental approaches for preprocessing data in order to balance the distribution among dataset. In this study, we thoroughly discuss about the preprocessing techniques and approaches, as well as, challenges faced by researchers to overcome the weaknesses of resampling techniques. This paper highlights the basic issues of classifiers, which endorse bias for majority class and ignore the minority class. Additionally, we synthesize viable solutions and potential suggestions on how to handle the problems in prepro-cessing of data effectively, also present open issues that call for further research.
引用总数
202020212022202320242751112
学术搜索中的文章
H Ali, MNM Salleh, K Hussain, A Ahmad, A Ullah… - International Journal of Engineering & Technology, 2019