Towards reliable interactive data cleaning: A user survey and recommendations

S Krishnan, D Haas, MJ Franklin, E Wu - … of the Workshop on Human-In …, 2016 - dl.acm.org
Data cleaning is frequently an iterative process tailored to the requirements of a specific
analysis task. The design and implementation of iterative data cleaning tools presents novel …

Wisteria: Nurturing scalable data cleaning infrastructure

D Haas, S Krishnan, J Wang, MJ Franklin… - Proceedings of the VLDB …, 2015 - dl.acm.org
Analysts report spending upwards of 80% of their time on problems in data cleaning. The
data cleaning process is inherently iterative, with evolving cleaning workflows that start with …

[PDF][PDF] Effective Data Cleaning with Continuous Evaluation.

IF Ilyas - IEEE Data Eng. Bull., 2016 - cs.uwaterloo.ca
Enterprises have been acquiring large amounts of data from a variety of sources to build
their own “Data Lakes”, with the goal of enriching their data asset and enabling richer and …

An extensible framework for data cleaning

H Galhardas, D Florescu, D Shasha, E Simon - 1999 - inria.hal.science
Data integration solutions dealing with large amounts of data have been strongly required in
the last few years. Besides the traditional data integration problems (eg schema integration …

Alphaclean: Automatic generation of data cleaning pipelines

S Krishnan, E Wu - arXiv preprint arXiv:1904.11827, 2019 - arxiv.org
The analyst effort in data cleaning is gradually shifting away from the design of hand-written
scripts to building and tuning complex pipelines of automated data cleaning libraries. Hyper …

Quantitative data cleaning for large databases

JM Hellerstein - 2013 - biblioteca.unisced.edu.mz
Data collection has become a ubiquitous function of large organizations {not only for record
keeping, but to support a variety of data analysis tasks that are critical to the organizational …

[图书][B] Data cleaning

IF Ilyas, X Chu - 2019 - books.google.com
This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …

Descriptive and prescriptive data cleaning

A Chalamalla, IF Ilyas, M Ouzzani… - Proceedings of the 2014 …, 2014 - dl.acm.org
Data cleaning techniques usually rely on some quality rules to identify violating tuples, and
then fix these violations using some repair algorithms. Oftentimes, the rules, which are …

Query-oriented data cleaning with oracles

M Bergman, T Milo, S Novgorodov… - Proceedings of the 2015 …, 2015 - dl.acm.org
As key decisions are often made based on information contained in a database, it is
important for the database to be as complete and correct as possible. For this reason, many …

Qualitative data cleaning

X Chu, IF Ilyas - Proceedings of the VLDB Endowment, 2016 - dl.acm.org
Data quality is one of the most important problems in data management, since dirty data
often leads to inaccurate data analytics results and wrong business decisions. Data cleaning …