Data preparation: A technological perspective and review

AAA Fernandes, M Koehler, N Konstantinou… - SN Computer …, 2023 - Springer
Data analysis often uses data sets that were collected for different purposes. Indeed, new
insights are often obtained by combining data sets that were produced independently of …

Baran: Effective error correction via a unified context representation and transfer learning

M Mahdavi, Z Abedjan - Proceedings of the VLDB Endowment, 2020 - dl.acm.org
Traditional error correction solutions leverage handmaid rules or master data to find the
correct values. Both are often amiss in real-world scenarios. Therefore, it is desirable to …

An experimental survey of missing data imputation algorithms

X Miao, Y Wu, L Chen, Y Gao… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Due to the ubiquity of missing data, data imputation has received extensive attention in the
past decades. It is a well-recognized problem impacting almost all fields of scientific study …

Parker: Data fusion through consistent repairs using edit rules under partial keys

A Bronselaer, M Acosta - Information Fusion, 2023 - Elsevier
Data integration is the problem of consolidating information provided by multiple sources.
After schema mapping and duplicate detection have been dealt with, the problem consists in …

Similarity Measures For Incomplete Database Instances

B Glavic, G Mecca, RJ Miller, P Papotti… - Advances in Database …, 2024 - iris.unibas.it
The problem of comparing database instances with incompleteness is prevalent in
applications such as analyzing how a dataset has evolved over time (eg, data versioning) …

Fast detection of denial constraint violations

EHM Pena, EC de Almeida, F Naumann - Proceedings of the VLDB …, 2021 - dl.acm.org
The detection of constraint-based errors is a critical task in many data cleaning solutions.
Previous works perform the task either using traditional data management systems or using …

Fast approximate denial constraint discovery

R Xiao, Z Tan, H Wang, S Ma - Proceedings of the VLDB Endowment, 2022 - dl.acm.org
We investigate the problem of discovering approximate denial constraints (DCs), for finding
DCs that hold with some exceptions to avoid overfitting real-life dirty data and facilitate data …

Fast Algorithms for Denial Constraint Discovery

EHM Pena, F Porto, F Naumann - Proceedings of the VLDB Endowment, 2022 - dl.acm.org
Denial constraints (DCs) are an integrity constraint formalism widely used to detect
inconsistencies in data. Several algorithms have been devised to discover DCs from data …

Data quality management: an overview of methods and challenges

A Bronselaer - Flexible Query Answering Systems: 14th International …, 2021 - Springer
Data quality is a problem studied in many different research disciplines like computer
science, statistics and economics. More often than not, these different disciplines come with …

Cleaning data with selection rules

T Boeckling, G De Tré, A Bronselaer - IEEE Access, 2022 - ieeexplore.ieee.org
In this paper, we propose and study a type of tuple-level constraint that arises from the
selection operator of relational algebra and that closely resembles the concepts of tuple …