SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary

A Fernández, S Garcia, F Herrera, NV Chawla - Journal of artificial …, 2018 - jair.org
The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is
considered" de facto" standard in the framework of learning from imbalanced data. This is …

[HTML][HTML] Learning from imbalanced data: open challenges and future directions

B Krawczyk - Progress in Artificial Intelligence, 2016 - Springer
Despite more than two decades of continuous development learning from imbalanced data
is still a focus of intense research. Starting as a problem of skewed distributions of binary …

Big data preprocessing: methods and prospects

S García, S Ramírez-Gallego, J Luengo, JM Benítez… - Big data analytics, 2016 - Springer
The massive growth in the scale of data has been observed in recent years being a key
factor of the Big Data scenario. Big Data can be defined as high volume, velocity and variety …

[HTML][HTML] 不平衡数据分类方法综述

李艳霞, 柴毅, 胡友强, 尹宏鹏 - 控制与决策, 2019 - kzyjc.alljournals.cn
随着信息技术的快速发展, 各领域的数据正以前所未有的速度产生并被广泛收集和存储,
如何实现数据的智能化处理从而利用数据中蕴含的有价值信息已成为理论和应用的研究热点 …

Tutorial on practical tips of the most influential data preprocessing algorithms in data mining

S García, J Luengo, F Herrera - Knowledge-Based Systems, 2016 - Elsevier
Data preprocessing is a major and essential stage whose main goal is to obtain final data
sets that can be considered correct and useful for further data mining algorithms. This paper …

KEEL 3.0: an open source software for multi-stage analysis in data mining

I Triguero, S González, JM Moyano, S García… - International Journal of …, 2017 - Springer
This paper introduces the 3rd major release of the KEEL Software. KEEL is an open source
Java framework (GPLv3 license) that provides a number of modules to perform a wide …

Text classification method based on self-training and LDA topic models

M Pavlinek, V Podgorelec - Expert Systems with Applications, 2017 - Elsevier
Supervised text classification methods are efficient when they can learn with reasonably
sized labeled sets. On the other hand, when only a small set of labeled documents is …

Self-training semi-supervised classification based on density peaks of data

D Wu, M Shang, X Luo, J Xu, H Yan, W Deng, G Wang - Neurocomputing, 2018 - Elsevier
Having a multitude of unlabeled data and few labeled ones is a common problem in many
practical applications. A successful methodology to tackle this problem is self-training semi …

A highly accurate framework for self-labeled semisupervised classification in industrial applications

D Wu, X Luo, G Wang, M Shang… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org
Self-labeled technique, a paradigm of semisupervised classification (SSC), is highly
effective in alleviating the shortage of labeled data in classification tasks via an iterative self …

Hyperparameter learning for deep learning-based recommender systems

D Wu, B Sun, M Shang - IEEE Transactions on Services …, 2023 - ieeexplore.ieee.org
Deep learning (DL)-based recommender system (RS), particularly for its advances in the
recent five years, has been startling. It reshapes the architectures of traditional RSs by lifting …