SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary

A Fernández, S Garcia, F Herrera, NV Chawla - Journal of artificial …, 2018 - jair.org
The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is
considered" de facto" standard in the framework of learning from imbalanced data. This is …

Big data preprocessing: methods and prospects

S García, S Ramírez-Gallego, J Luengo, JM Benítez… - Big data analytics, 2016 - Springer
The massive growth in the scale of data has been observed in recent years being a key
factor of the Big Data scenario. Big Data can be defined as high volume, velocity and variety …

The balancing trick: Optimized sampling of imbalanced datasets—A brief survey of the recent State of the Art

S Susan, A Kumar - Engineering Reports, 2021 - Wiley Online Library
This survey paper focuses on one of the current primary issues challenging data mining
researchers experimenting on real‐world datasets. The problem is that of imbalanced class …

An insight into imbalanced big data classification: outcomes and challenges

A Fernández, S del Río, NV Chawla… - Complex & Intelligent …, 2017 - Springer
Big Data applications are emerging during the last years, and researchers from many
disciplines are aware of the high advantages related to the knowledge extraction from this …

An oversampling method for class imbalance problems on large datasets

F Rodríguez-Torres, JF Martínez-Trinidad… - Applied Sciences, 2022 - mdpi.com
Several oversampling methods have been proposed for solving the class imbalance
problem. However, most of them require searching the k-nearest neighbors to generate …

[HTML][HTML] Experimental evaluation of ensemble classifiers for imbalance in big data

M Juez-Gil, Á Arnaiz-González, JJ Rodríguez… - Applied soft …, 2021 - Elsevier
Datasets are growing in size and complexity at a pace never seen before, forming ever
larger datasets known as Big Data. A common problem for classification, especially in Big …

Imbalanced big data classification: a distributed implementation of smote

AK Rastogi, N Narang, ZA Siddiqui - … of the workshop program of the …, 2018 - dl.acm.org
In the domain of machine learning, quality of data is most critical component for building
good models. Predictive analytics is an AI stream used to predict future events based on …

A survey on big data pre-processing

Z Guan, T Ji, X Qian, Y Ma… - … /2nd Intl Conf on Big Data …, 2017 - ieeexplore.ieee.org
In this paper, we briefly introduce some basic concepts and characteristics of big data. We
are surrounded by massive amount of data but starving for knowledge. In the era of Big …

Imbalanced data preprocessing for big data

J Luengo, D García-Gil, S Ramírez-Gallego… - Big Data Preprocessing …, 2020 - Springer
The negative impact on learning associated with imbalanced proportion of classes has
exploded lately with the exponential growth of “cheap” data. Many real-world problems …

Imbalanced classification for big data

A Fernández, S García, M Galar, RC Prati… - … from imbalanced data …, 2018 - Springer
New developments in computation have allowed an explosion for both data generation and
storage. The high value that is hidden within this large volume of data has attracted more …