Big data preprocessing: methods and prospects

S García, S Ramírez-Gallego, J Luengo, JM Benítez… - Big data analytics, 2016 - Springer
The massive growth in the scale of data has been observed in recent years being a key
factor of the Big Data scenario. Big Data can be defined as high volume, velocity and variety …

Systematic literature review of preprocessing techniques for imbalanced data

EA Felix, SP Lee - Iet Software, 2019 - Wiley Online Library
Data preprocessing remains an important step in machine learning studies. This is because
proper preprocessing of imbalanced data can enable researchers to reduce defects as …

Data preprocessing techniques for classification without discrimination

F Kamiran, T Calders - Knowledge and information systems, 2012 - Springer
Abstract Recently, the following Discrimination-Aware Classification Problem was
introduced: Suppose we are given training data that exhibit unlawful discrimination; eg …

Tutorial on practical tips of the most influential data preprocessing algorithms in data mining

S García, J Luengo, F Herrera - Knowledge-Based Systems, 2016 - Elsevier
Data preprocessing is a major and essential stage whose main goal is to obtain final data
sets that can be considered correct and useful for further data mining algorithms. This paper …

Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors

NM Gazzaz, MK Yusoff, AZ Aris, H Juahir… - Marine pollution …, 2012 - Elsevier
This article describes design and application of feed-forward, fully-connected, three-layer
perceptron neural network model for computing the water quality index (WQI) 1 for Kinta …

On the choice of the best imputation methods for missing values considering three groups of classification methods

J Luengo, S García, F Herrera - Knowledge and information systems, 2012 - Springer
In real-life data, information is frequently lost in data mining, caused by the presence of
missing values in attributes. Several schemes have been studied to overcome the …

Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model

AM Sefidian, N Daneshpour - Expert Systems with Applications, 2019 - Elsevier
The presence of missing values in real-world data is not only a prevalent problem but also
an inevitable one. Therefore, missing values should be handled carefully before the mining …

A survey of data semantization in internet of things

F Shi, Q Li, T Zhu, H Ning - Sensors, 2018 - mdpi.com
With the development of Internet of Things (IoT), more and more sensors, actuators and
mobile devices have been deployed into our daily lives. The result is that tremendous data …

Missing value imputation through shorter interval selection driven by Fuzzy C-Means clustering

H Khan, X Wang, H Liu - Computers & Electrical Engineering, 2021 - Elsevier
The presence of missing data is a common and pivotal issue, which generally leads to a
serious decrease of data quality and thus indicates the necessity to effectively handle …

Missing value imputation using a fuzzy clustering-based EM approach

MG Rahman, MZ Islam - Knowledge and Information Systems, 2016 - Springer
Data preprocessing and cleansing play a vital role in data mining by ensuring good quality
of data. Data-cleansing tasks include imputation of missing values, identification of outliers …