The problem of mining integrity constraints from data has been extensively studied over the past two decades for commonly used types of constraints including the classic Functional …
Record fusion is the task of aggregating multiple records that correspond to the same real- world entity in a database. We can view record fusion as a machine learning problem where …
We introduce a learning framework for the problem of unifying conflicting data in multiple records referring to the same entity—we call this problem “record fusion.” Record fusion …
IF Ilyas, F Naumann - IEEE Data Eng. Bull., 2022 - sites.computer.org
5 Conclusion To conclude, we suggest opening a new chapter of data quality and data cleaning that understands the entire data processing pipeline, in particular tracing it to the …
In most theoretical studies on missing data analysis, data is typically assumed to be missing according to a specific probabilistic model. However, such assumption may not accurately …
Data deduplication is the task of detecting records in a database that correspond to the same real-world entity. Our goal is to develop a procedure that samples uniformly from the …
Many errors cannot be detected or repaired without taking into account the underlying structure and dependencies in the dataset. One way of modeling the structure of the data is …
Inference in structured prediction is naturally modeled with a graph, where the goal is to recover the unknown true label for each node given noisy observations corresponding to …