[PDF][PDF] From Cleaning before ML to Cleaning for ML.

F Neutatz, B Chen, Z Abedjan, E Wu - IEEE Data Eng. Bull., 2021 - scholar.archive.org
Data cleaning is widely regarded as a critical piece of machine learning (ML) applications,
as data errors can corrupt models in ways that cause the application to operate incorrectly …

Machine learning and data cleaning: Which serves the other?

IF Ilyas, T Rekatsinas - ACM Journal of Data and Information Quality …, 2022 - dl.acm.org
The last few years witnessed significant advances in building automated or semi-automated
data quality, data cleaning and data integration systems powered by machine learning (ML) …

Data cleaning and machine learning: a systematic literature review

PO Côté, A Nikanjam, N Ahmed, D Humeniuk… - Automated Software …, 2024 - Springer
Abstract Machine Learning (ML) is integrated into a growing number of systems for various
applications. Because the performance of an ML model is highly dependent on the quality of …

A survey on data cleaning methods for improved machine learning model performance

GY Lee, L Alzamil, B Doskenov… - arXiv preprint arXiv …, 2021 - arxiv.org
Data cleaning is the initial stage of any machine learning project and is one of the most
critical processes in data analysis. It is a critical step in ensuring that the dataset is devoid of …

[图书][B] Data cleaning

IF Ilyas, X Chu - 2019 - books.google.com
This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …

[PDF][PDF] Cleanml: A benchmark for joint data cleaning and machine learning [experiments and analysis]

P Li, X Rao, J Blase, Y Zhang, X Chu… - arXiv preprint arXiv …, 2019 - chu-data-lab.github.io
It is widely recognized that the data quality affects machine learning (ML) model
performances, and data scientists spend considerable amount of time on data cleaning …

Activeclean: Interactive data cleaning while learning convex loss models

S Krishnan, J Wang, E Wu, MJ Franklin… - arXiv preprint arXiv …, 2016 - arxiv.org
Data cleaning is often an important step to ensure that predictive models, such as regression
and classification, are not affected by systematic errors such as inconsistent, out-of-date, or …

[PDF][PDF] Rein: A comprehensive benchmark framework for data cleaning methods in ml pipelines

M Abdelaal, C Hammacher… - arXiv preprint arXiv …, 2023 - openproceedings.org
Nowadays, machine learning (ML) plays a vital role in many aspects of our daily life. In
essence, building well-performing ML applications requires the provision of high-quality …

Cleanml: A study for evaluating the impact of data cleaning on ml classification tasks

P Li, X Rao, J Blase, Y Zhang, X Chu… - 2021 IEEE 37th …, 2021 - ieeexplore.ieee.org
Data quality affects machine learning (ML) model performances, and data scientists spend
considerable amount of time on data cleaning before model training. However, to date, there …

Activeclean: An interactive data cleaning framework for modern machine learning

S Krishnan, MJ Franklin, K Goldberg, J Wang… - Proceedings of the 2016 …, 2016 - dl.acm.org
Databases can be corrupted with various errors such as missing, incorrect, or inconsistent
values. Increasingly, modern data analysis pipelines involve Machine Learning, and the …