[HTML][HTML] A survey on addressing high-class imbalance in big data

JL Leevy, TM Khoshgoftaar, RA Bauder, N Seliya - Journal of Big Data, 2018 - Springer
In a majority–minority classification problem, class imbalance in the dataset (s) can
dramatically skew the performance of classifiers, introducing a prediction bias for the …

[HTML][HTML] Learning from imbalanced data: open challenges and future directions

B Krawczyk - Progress in artificial intelligence, 2016 - Springer
Despite more than two decades of continuous development learning from imbalanced data
is still a focus of intense research. Starting as a problem of skewed distributions of binary …

A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and …

S González, S García, J Del Ser, L Rokach, F Herrera - Information Fusion, 2020 - Elsevier
Ensembles, especially ensembles of decision trees, are one of the most popular and
successful techniques in machine learning. Recently, the number of ensemble-based …

[HTML][HTML] Big data preprocessing: methods and prospects

S García, S Ramírez-Gallego, J Luengo, JM Benítez… - Big data analytics, 2016 - Springer
The massive growth in the scale of data has been observed in recent years being a key
factor of the Big Data scenario. Big Data can be defined as high volume, velocity and variety …

Machine learning meets omics: applications and perspectives

R Li, L Li, Y Xu, J Yang - Briefings in Bioinformatics, 2022 - academic.oup.com
The innovation of biotechnologies has allowed the accumulation of omics data at an
alarming rate, thus introducing the era of 'big data'. Extracting inherent valuable knowledge …

Imbalanced deep learning by minority class incremental rectification

Q Dong, S Gong, X Zhu - IEEE transactions on pattern analysis …, 2018 - ieeexplore.ieee.org
Model learning from class imbalanced training data is a long-standing and significant
challenge for machine learning. In particular, existing deep learning methods consider …

A Pearson's correlation coefficient based decision tree and its parallel implementation

Y Mu, X Liu, L Wang - Information Sciences, 2018 - Elsevier
In this paper, a Pearson's correlation coefficient based decision tree (PCC-Tree) is
established and its parallel implementation is developed in the framework of Map-Reduce …

kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data

J Maillo, S Ramírez, I Triguero, F Herrera - Knowledge-Based Systems, 2017 - Elsevier
Abstract The k-Nearest Neighbors classifier is a simple yet effective widely renowned
method in data mining. The actual application of this model in the big data domain is not …

Learning imbalanced datasets based on SMOTE and Gaussian distribution

T Pan, J Zhao, W Wu, J Yang - Information Sciences, 2020 - Elsevier
The learning of imbalanced datasets is a ubiquitous challenge for researchers in the fields of
data mining and machine learning. Conventional classifiers are often biased towards the …

[HTML][HTML] Improving K-means clustering with enhanced Firefly Algorithms

H Xie, L Zhang, CP Lim, Y Yu, C Liu, H Liu… - Applied Soft …, 2019 - Elsevier
In this research, we propose two variants of the Firefly Algorithm (FA), namely inward
intensified exploration FA (IIEFA) and compound intensified exploration FA (CIEFA), for …