This is an overview of the end-to-end data cleaning process. Data quality is one of the most important problems in data management, since dirty data often leads to inaccurate data …
Profiling data to determine metadata about a given dataset is an important and frequent activity of any IT professional and researcher and is necessary for various use-cases. It …
Data cleaning is an important problem and data quality rules are the most promising way to face it with a declarative approach. Previous work has focused on specific formalisms, such …
Data quality is one of the most important problems in data management. A database system typically aims to support the creation, maintenance and use of large amount of data …
B Saha, D Srivastava - 2014 IEEE 30th international conference …, 2014 - ieeexplore.ieee.org
In our Big Data era, data is being generated, collected and analyzed at an unprecedented scale, and data-driven decision making is sweeping through all aspects of society. Recent …
W Fan - Proceedings of the VLDB Endowment, 2022 - dl.acm.org
Big data is typically characterized with 4V's: Volume, Velocity, Variety and Veracity. When it comes to big graphs, these challenges become even more staggering. Each and every of …
W Fan, F Geerts, J Li, M Xiong - IEEE Transactions on …, 2010 - ieeexplore.ieee.org
This paper investigates the discovery of conditional functional dependencies (CFDs). CFDs are a recent extension of functional dependencies (FDs) by supporting patterns of …
In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the cleaning process to enhance and accelerate existing automatic repair …
Recently, there has been a renovated interest in functional dependencies due to the possibility of employing them in several advanced database operations, such as data …