The generation of synthetic data can be used for anonymization, regularization, oversampling, semi-supervised learning, self-supervised learning, and several other tasks …
A Heidari, J McGrath, IF Ilyas… - Proceedings of the 2019 …, 2019 - dl.acm.org
We introduce a few-shot learning framework for error detection. We show that data augmentation (a form of weak supervision) is key to training high-quality, ML-based error …
IF Ilyas, T Rekatsinas - ACM Journal of Data and Information Quality …, 2022 - dl.acm.org
The last few years witnessed significant advances in building automated or semi-automated data quality, data cleaning and data integration systems powered by machine learning (ML) …
Organizations are increasingly relying on data to support decisions. When data contains private and sensitive information, the data owner often desires to publish a synthetic …
We investigate the complexity of computing an optimal repair of an inconsistent database, in the case where integrity constraints are Functional Dependencies (FDs). We focus on two …
L Bertossi - Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI …, 2019 - dl.acm.org
In this article we review the main concepts around database repairs and consistent query answering, with emphasis on tracing back the origin, motivation, and early developments …
We study the problem of discovering functional dependencies (FD) from a noisy data set. We adopt a statistical perspective and draw connections between FD discovery and structure …
A Lew, M Agrawal, D Sontag… - … conference on artificial …, 2021 - proceedings.mlr.press
Data cleaning is naturally framed as probabilistic inference in a generative model of ground- truth data and likely errors, but the diversity of real-world error patterns and the hardness of …
D Miao, Z Cai, J Li, X Gao, X Liu - Proceedings of the VLDB Endowment, 2020 - dl.acm.org
Computing an optimal subset repair of an inconsistent database is becoming a standalone research problem and has a wide range of applications. However, it has not been well …