Data cleaning: Overview and emerging challenges

X Chu, IF Ilyas, S Krishnan, J Wang - Proceedings of the 2016 …, 2016 - dl.acm.org
Detecting and repairing dirty data is one of the perennial challenges in data analytics, and
failure to do so can result in inaccurate analytics and unreliable decisions. Over the past few …

A review on data cleansing methods for big data

F Ridzuan, WMNW Zainon - Procedia Computer Science, 2019 - Elsevier
Massive amounts of data are available for the organization which will influence their
business decision. Data collected from the various resources are dirty and this will affect the …

Can foundation models wrangle your data?

A Narayan, I Chami, L Orr, S Arora, C Ré - arXiv preprint arXiv:2205.09911, 2022 - arxiv.org
Foundation Models (FMs) are models trained on large corpora of data that, at very large
scale, can generalize to new tasks without any task-specific finetuning. As these models …

AutoML: A survey of the state-of-the-art

X He, K Zhao, X Chu - Knowledge-based systems, 2021 - Elsevier
Deep learning (DL) techniques have obtained remarkable achievements on various tasks,
such as image recognition, object detection, and language modeling. However, building a …

Benchmark and survey of automated machine learning frameworks

MA Zöller, MF Huber - Journal of artificial intelligence research, 2021 - jair.org
Abstract Machine learning (ML) has become a vital part in many aspects of our daily life.
However, building well performing machine learning applications requires highly …

[图书][B] Data cleaning

IF Ilyas, X Chu - 2019 - books.google.com
This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …

Holoclean: Holistic data repairs with probabilistic inference

T Rekatsinas, X Chu, IF Ilyas, C Ré - arXiv preprint arXiv:1702.00820, 2017 - arxiv.org
We introduce HoloClean, a framework for holistic data repairing driven by probabilistic
inference. HoloClean unifies existing qualitative data repairing approaches, which rely on …

Detecting data errors: Where are we and what needs to be done?

Z Abedjan, X Chu, D Deng, RC Fernandez… - Proceedings of the …, 2016 - dl.acm.org
Data cleaning has played a critical role in ensuring data quality for enterprise applications.
Naturally, there has been extensive research in this area, and many data cleaning …

C-store: a column-oriented DBMS

M Stonebraker, DJ Abadi, A Batkin, X Chen… - … Databases Work: the …, 2018 - dl.acm.org
This paper presents the design of a read-optimized relational DBMS that contrasts sharply
with most current systems, which are write-optimized. Among the many differences in its …

Holodetect: Few-shot learning for error detection

A Heidari, J McGrath, IF Ilyas… - Proceedings of the 2019 …, 2019 - dl.acm.org
We introduce a few-shot learning framework for error detection. We show that data
augmentation (a form of weak supervision) is key to training high-quality, ML-based error …