Machine learning and data cleaning: Which serves the other?

IF Ilyas, T Rekatsinas - ACM Journal of Data and Information Quality …, 2022 - dl.acm.org
The last few years witnessed significant advances in building automated or semi-automated
data quality, data cleaning and data integration systems powered by machine learning (ML) …

Approximate denial constraints

E Livshits, A Heidari, IF Ilyas, B Kimelfeld - arXiv preprint arXiv:2005.08540, 2020 - arxiv.org
The problem of mining integrity constraints from data has been extensively studied over the
past two decades for commonly used types of constraints including the classic Functional …

Record fusion: A learning approach

A Heidari, G Michalopoulos, S Kushagra… - arXiv preprint arXiv …, 2020 - arxiv.org
Record fusion is the task of aggregating multiple records that correspond to the same real-
world entity in a database. We can view record fusion as a machine learning problem where …

Record Fusion via Inference and Data Augmentation

A Heidari, G Michalopoulos, IF Ilyas… - ACM/JMS Journal of Data …, 2024 - dl.acm.org
We introduce a learning framework for the problem of unifying conflicting data in multiple
records referring to the same entity—we call this problem “record fusion.” Record fusion …

Data Errors: Symptoms, Causes and Origins.

IF Ilyas, F Naumann - IEEE Data Eng. Bull., 2022 - sites.computer.org
5 Conclusion To conclude, we suggest opening a new chapter of data quality and data
cleaning that understands the entire data processing pipeline, in particular tracing it to the …

GRAPH-BASED ANALYSIS OF NON-RANDOM MISSING DATA PROBLEMS WITH LOW-RANK NATURE: STRUCTURED PREDICTION, MATRIX COMPLETION AND …

H Lee - 2023 - hammer.purdue.edu
In most theoretical studies on missing data analysis, data is typically assumed to be missing
according to a specific probabilistic model. However, such assumption may not accurately …

On sampling from data with duplicate records

A Heidari, S Kushagra, IF Ilyas - arXiv preprint arXiv:2008.10549, 2020 - arxiv.org
Data deduplication is the task of detecting records in a database that correspond to the
same real-world entity. Our goal is to develop a procedure that samples uniformly from the …

Structured Prediction on Dirty Datasets

A Heidarikhazaei - 2021 - uwspace.uwaterloo.ca
Many errors cannot be detected or repaired without taking into account the underlying
structure and dependencies in the dataset. One way of modeling the structure of the data is …

On the Fundamental Limits of Exact Inference in Structured Prediction

H Lee, K Bello, J Honorio - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Inference in structured prediction is naturally modeled with a graph, where the goal is to
recover the unknown true label for each node given noisy observations corresponding to …