An experimental survey of missing data imputation algorithms

X Miao, Y Wu, L Chen, Y Gao… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Due to the ubiquity of missing data, data imputation has received extensive attention in the
past decades. It is a well-recognized problem impacting almost all fields of scientific study …

Study on missing values and outlier detection in concurrence with data quality enhancement for efficient data processing

FA Vinisha, L Sujihelen - 2022 4th international conference on …, 2022 - ieeexplore.ieee.org
Data analytics is the process of analyzing raw data to make predictions and derive
conclusions. This process involves collecting and organizing data to discover hidden …

Data Quality Assessment: Challenges and Opportunities

S Mohammed, H Harmouch, F Naumann… - arXiv preprint arXiv …, 2024 - arxiv.org
Data-oriented applications, their users, and even the law require data of high quality.
Research has broken down the rather vague notion of data quality into various dimensions …

[PDF][PDF] Rein: A comprehensive benchmark framework for data cleaning methods in ml pipelines

M Abdelaal, C Hammacher… - arXiv preprint arXiv …, 2023 - openproceedings.org
Nowadays, machine learning (ML) plays a vital role in many aspects of our daily life. In
essence, building well-performing ML applications requires the provision of high-quality …

Data cleaning using large language models

S Zhang, Z Huang, E Wu - arXiv preprint arXiv:2410.15547, 2024 - arxiv.org
Data cleaning is a crucial yet challenging task in data analysis, often requiring significant
manual effort. To automate data cleaning, previous systems have relied on statistical rules …

Pattern functional dependencies for data cleaning

A Qahtan, N Tang, M Ouzzani, Y Cao… - Proceedings of the …, 2020 - research.ed.ac.uk
Patterns (or regex-based expressions) are widely used to constrain the format of a domain
(or a column), eg, a Year column should contain only four digits, and thus a value like “1980 …

Autocure: Automated tabular data curation technique for ml pipelines

M Abdelaal, R Koparde, H Schoening - Proceedings of the Sixth …, 2023 - dl.acm.org
Machine learning algorithms have become increasingly prevalent in multiple domains, such
as autonomous driving, healthcare, and finance. In such domains, data preparation remains …

Similarity Measures For Incomplete Database Instances

B Glavic, G Mecca, RJ Miller, P Papotti… - Advances in Database …, 2024 - iris.unibas.it
The problem of comparing database instances with incompleteness is prevalent in
applications such as analyzing how a dataset has evolved over time (eg, data versioning) …

Regression with sensor data containing incomplete observations

T Katsuki, T Osogami - International Conference on Machine …, 2023 - proceedings.mlr.press
This paper addresses a regression problem in which output label values are the results of
sensing the magnitude of a phenomenon. A low value of such labels can mean either that …

Anatomy of metadata for data curation

L Visengeriyeva, Z Abedjan - Journal of Data and Information Quality …, 2020 - dl.acm.org
Real-world datasets often suffer from various data quality problems. Several data cleaning
solutions have been proposed so far. However, data cleaning remains a manual and …