Big data preprocessing: methods and prospects

S García, S Ramírez-Gallego, J Luengo, JM Benítez… - Big data analytics, 2016 - Springer
The massive growth in the scale of data has been observed in recent years being a key
factor of the Big Data scenario. Big Data can be defined as high volume, velocity and variety …

A review of microarray datasets and applied feature selection methods

V Bolón-Canedo, N Sánchez-Marono… - Information …, 2014 - Elsevier
Microarray data classification is a difficult challenge for machine learning researchers due to
its high number of features and the small sample sizes. Feature selection has been soon …

Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study

I Triguero, S García, F Herrera - Knowledge and Information systems, 2015 - Springer
Semi-supervised classification methods are suitable tools to tackle training sets with large
amounts of unlabeled data and a small quantity of labeled data. This problem has been …

Deep learning for missing value imputation of continuous data and the effect of data discretization

WC Lin, CF Tsai, JR Zhong - Knowledge-Based Systems, 2022 - Elsevier
Often real-world datasets are incomplete and contain some missing attribute values.
Furthermore, many data mining and machine learning techniques cannot directly handle …

Tutorial on practical tips of the most influential data preprocessing algorithms in data mining

S García, J Luengo, F Herrera - Knowledge-Based Systems, 2016 - Elsevier
Data preprocessing is a major and essential stage whose main goal is to obtain final data
sets that can be considered correct and useful for further data mining algorithms. This paper …

KEEL 3.0: an open source software for multi-stage analysis in data mining

I Triguero, S González, JM Moyano, S García… - International Journal of …, 2017 - Springer
This paper introduces the 3rd major release of the KEEL Software. KEEL is an open source
Java framework (GPLv3 license) that provides a number of modules to perform a wide …

Feature selection based on improved ant colony optimization for online detection of foreign fiber in cotton

X Zhao, D Li, B Yang, C Ma, Y Zhu, H Chen - Applied Soft Computing, 2014 - Elsevier
Feature selection plays an important role in the machine-vision-based online detection of
foreign fibers in cotton because of improvement detection accuracy and speed. Feature sets …

Explorekit: Automatic feature generation and selection

G Katz, ECR Shin, D Song - 2016 IEEE 16th International …, 2016 - ieeexplore.ieee.org
Feature generation is one of the challenging aspects of machine learning. We present
ExploreKit, a framework for automated feature generation. ExploreKit generates a large set …

Rise of the planet of serverless computing: A systematic review

J Wen, Z Chen, X Jin, X Liu - ACM Transactions on Software …, 2023 - dl.acm.org
Serverless computing is an emerging cloud computing paradigm, being adopted to develop
a wide range of software applications. It allows developers to focus on the application logic …

Data preprocessing in predictive data mining

SAN Alexandropoulos, SB Kotsiantis… - The Knowledge …, 2019 - cambridge.org
A large variety of issues influence the success of data mining on a given problem. Two
primary and important issues are the representation and the quality of the dataset …