A formal framework for probabilistic unclean databases

[图书][B] Data cleaning

IF Ilyas, X Chu - 2019 - books.google.com

This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …

被引用次数：358 相关文章所有 5 个版本

[PDF] springer.com

Tabular and latent space synthetic data generation: a literature review

J Fonseca, F Bacao - Journal of Big Data, 2023 - Springer

The generation of synthetic data can be used for anonymization, regularization,
oversampling, semi-supervised learning, self-supervised learning, and several other tasks …

被引用次数：77 相关文章所有 12 个版本

[PDF] acm.org

Holodetect: Few-shot learning for error detection

A Heidari, J McGrath, IF Ilyas… - Proceedings of the 2019 …, 2019 - dl.acm.org

We introduce a few-shot learning framework for error detection. We show that data
augmentation (a form of weak supervision) is key to training high-quality, ML-based error …

被引用次数：168 相关文章所有 7 个版本

Machine learning and data cleaning: Which serves the other?

IF Ilyas, T Rekatsinas - ACM Journal of Data and Information Quality …, 2022 - dl.acm.org

The last few years witnessed significant advances in building automated or semi-automated
data quality, data cleaning and data integration systems powered by machine learning (ML) …

被引用次数：56 相关文章

[PDF] arxiv.org

Kamino: Constraint-aware differentially private data synthesis

C Ge, S Mohapatra, X He, IF Ilyas - arXiv preprint arXiv:2012.15713, 2020 - arxiv.org

Organizations are increasingly relying on data to support decisions. When data contains
private and sensitive information, the data owner often desires to publish a synthetic …

被引用次数：54 相关文章所有 7 个版本

[PDF] acm.org

Computing optimal repairs for functional dependencies

E Livshits, B Kimelfeld, S Roy - ACM Transactions on Database Systems …, 2020 - dl.acm.org

We investigate the complexity of computing an optimal repair of an inconsistent database, in
the case where integrity constraints are Functional Dependencies (FDs). We focus on two …

被引用次数：77 相关文章所有 11 个版本

[PDF] carleton.ca

Database repairs and consistent query answering: Origins and further developments

L Bertossi - Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI …, 2019 - dl.acm.org

In this article we review the main concepts around database repairs and consistent query
answering, with emphasis on tracing back the origin, motivation, and early developments …

被引用次数：61 相关文章所有 4 个版本

[PDF] acm.org

A statistical perspective on discovering functional dependencies in noisy data

Y Zhang, Z Guo, T Rekatsinas - Proceedings of the 2020 ACM SIGMOD …, 2020 - dl.acm.org

We study the problem of discovering functional dependencies (FD) from a noisy data set. We
adopt a statistical perspective and draw connections between FD discovery and structure …

被引用次数：38 相关文章所有 4 个版本

[PDF] mlr.press

PClean: Bayesian data cleaning at scale with domain-specific probabilistic programming

A Lew, M Agrawal, D Sontag… - … conference on artificial …, 2021 - proceedings.mlr.press

Data cleaning is naturally framed as probabilistic inference in a generative model of ground-
truth data and likely errors, but the diversity of real-world error patterns and the hardness of …

被引用次数：29 相关文章所有 2 个版本

[PDF] vldb.org

The computation of optimal subset repairs

D Miao, Z Cai, J Li, X Gao, X Liu - Proceedings of the VLDB Endowment, 2020 - dl.acm.org

Computing an optimal subset repair of an inconsistent database is becoming a standalone
research problem and has a wide range of applications. However, it has not been well …

被引用次数：26 相关文章所有 4 个版本

高级搜索

QQ 群