An insight into imbalanced big data classification: outcomes and challenges

A Fernández, S del Río, NV Chawla… - Complex & Intelligent …, 2017 - Springer
Big Data applications are emerging during the last years, and researchers from many
disciplines are aware of the high advantages related to the knowledge extraction from this …

An up-to-date comparison of state-of-the-art classification algorithms

C Zhang, C Liu, X Zhang, G Almpanidis - Expert Systems with Applications, 2017 - Elsevier
Current benchmark reports of classification algorithms generally concern common classifiers
and their variants but do not include many algorithms that have been introduced in recent …

PMLB: a large benchmark suite for machine learning evaluation and comparison

RS Olson, W La Cava, P Orzechowski, RJ Urbanowicz… - BioData mining, 2017 - Springer
Background The selection, development, or comparison of machine learning methods in
data mining can be a difficult task based on the target problem and goals of a particular …

An efficient instance selection algorithm for k nearest neighbor regression

Y Song, J Liang, J Lu, X Zhao - Neurocomputing, 2017 - Elsevier
Abstract The k-Nearest Neighbor algorithm (kNN) is an algorithm that is very simple to
understand for classification or regression. It is also a lazy algorithm that does not use the …

kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data

J Maillo, S Ramírez, I Triguero, F Herrera - Knowledge-Based Systems, 2017 - Elsevier
Abstract The k-Nearest Neighbors classifier is a simple yet effective widely renowned
method in data mining. The actual application of this model in the big data domain is not …

Classification of imbalanced data by oversampling in kernel space of support vector machines

J Mathew, CK Pang, M Luo… - IEEE transactions on …, 2017 - ieeexplore.ieee.org
Historical data sets for fault stage diagnosis in industrial machines are often imbalanced and
consist of multiple categories or classes. Learning discriminative models from such data sets …

KEEL 3.0: an open source software for multi-stage analysis in data mining

I Triguero, S González, JM Moyano, S García… - International Journal of …, 2017 - Springer
This paper introduces the 3rd major release of the KEEL Software. KEEL is an open source
Java framework (GPLv3 license) that provides a number of modules to perform a wide …

Confusion-matrix-based kernel logistic regression for imbalanced data classification

M Ohsaki, P Wang, K Matsuda… - … on Knowledge and …, 2017 - ieeexplore.ieee.org
There have been many attempts to classify imbalanced data, since this classification is
critical in a wide variety of applications related to the detection of anomalies, failures, and …

A new representation in PSO for discretization-based feature selection

B Tran, B Xue, M Zhang - IEEE Transactions on Cybernetics, 2017 - ieeexplore.ieee.org
In machine learning, discretization and feature selection (FS) are important techniques for
preprocessing data to improve the performance of an algorithm on high-dimensional data …

Evaluation in artificial intelligence: from task-oriented to ability-oriented measurement

J Hernández-Orallo - Artificial Intelligence Review, 2017 - Springer
The evaluation of artificial intelligence systems and components is crucial for the progress of
the discipline. In this paper we describe and critically assess the different ways AI systems …