[PDF][PDF] From Cleaning before ML to Cleaning for ML.

F Neutatz, B Chen, Z Abedjan, E Wu - IEEE Data Eng. Bull., 2021 - scholar.archive.org
Data cleaning is widely regarded as a critical piece of machine learning (ML) applications,
as data errors can corrupt models in ways that cause the application to operate incorrectly …

[PDF][PDF] Rein: A comprehensive benchmark framework for data cleaning methods in ml pipelines

M Abdelaal, C Hammacher… - arXiv preprint arXiv …, 2023 - openproceedings.org
Nowadays, machine learning (ML) plays a vital role in many aspects of our daily life. In
essence, building well-performing ML applications requires the provision of high-quality …

SDLER: stacked dedupe learning for entity resolution in big data era

A Ngueilbaye, H Wang, DA Mahamat… - The Journal of …, 2021 - Springer
Abstract In the Big Data Era, Entity Resolution (ER) faces many challenges such as high
scalability, the coexistence of complex similarity metrics, tautonymy and synonym, and the …

When Can We Ignore Missing Data in Model Training?

C Zhen, AS Chabada, A Termehchy - … of the Seventh Workshop on Data …, 2023 - dl.acm.org
Imputing missing data is typically expensive, and as a result, people seek to avoid it when
possible. To address this issue, we introduce a method that determines when data cleaning …

GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models

M Yan, Y Wang, Y Wang, X Miao, J Li - … of the ACM on Management of …, 2024 - dl.acm.org
Data quality is critical across many applications. The utility of data is undermined by various
errors, making rigorous data cleaning a necessity. Traditional data cleaning systems depend …

Running Temporal Logical Queries on the Relational Model

S Appleby, G Bergami, G Morgan - Proceedings of the 26th International …, 2022 - dl.acm.org
State of the art for model checking exploit computationally intensive solutions, bottlenecked
by either repeated data access or suboptimal algorithmic implementations. Our solution …

In-Database Data Imputation

M Perini, M Nikolic - Proceedings of the ACM on Management of Data, 2024 - dl.acm.org
Missing data is a widespread problem in many domains, creating challenges in data
analysis and decision making. Traditional techniques for dealing with missing data, such as …

Certain and Approximately Certain Models for Statistical Learning

C Zhen, N Aryal, A Termehchy… - Proceedings of the ACM on …, 2024 - dl.acm.org
Real-world data is often incomplete and contains missing values. To train accurate models
over real-world datasets, users need to spend a substantial amount of time and resources …

[PDF][PDF] Crayfish: Navigating the Labyrinth of Machine Learning Inference in Stream Processing Systems.

S Horchidan, PH Chen, E Kritharakis, P Carbone… - EDBT, 2024 - openproceedings.org
ABSTRACT As Machine Learning predictions are increasingly being used in business
analytics pipelines, integrating stream processing with model serving has become a …

Towards Consistent Language Models Using Declarative Constraints

J Mousavi, A Termehchy - arXiv preprint arXiv:2312.15472, 2023 - arxiv.org
Large language models have shown unprecedented abilities in generating linguistically
coherent and syntactically correct natural language output. However, they often return …