PClean: Bayesian data cleaning at scale with domain-specific probabilistic programming

A Lew, M Agrawal, D Sontag… - … conference on artificial …, 2021 - proceedings.mlr.press
Data cleaning is naturally framed as probabilistic inference in a generative model of ground-
truth data and likely errors, but the diversity of real-world error patterns and the hardness of …

BClean: A Bayesian Data Cleaning System

J Qin, S Huang, Y Wang, J Zhu, Y Zhang… - 2024 IEEE 40th …, 2024 - ieeexplore.ieee.org
There is a considerable body of work on data cleaning which employs various principles to
rectify erroneous data and transform a dirty dataset into a cleaner one. One of prevalent …

Statistical relational learning based automatic data cleaning

W Li, L Li, Z Li, M Cui - Frontiers of Computer Science, 2019 - journal.hep.com.cn
Data in real world is usually dirty, ie, it may contain inconsistent, noisy, incomplete or
duplicated values. Generally speaking, the identified dimensions of data quality …

Automatic Data Repairs with Statistical Relational Learning

L Li, W Li, L Zhu, C Li, Z Zhang - 2021 International Symposium …, 2021 - ieeexplore.ieee.org
Dirty data is ubiquitous in real-world, and data cleaning is a long-standing problem. The
importance of data cleaning is growing in the era of big data. In this paper we propose a …

PClean: Bayesian data cleaning at scale with domain-specific probabilistic programming

AK Lew, M Agrawal, D Sontag… - arXiv preprint arXiv …, 2020 - arxiv.org
Data cleaning is naturally framed as probabilistic inference in a generative model of ground-
truth data and likely errors, but the diversity of real-world error patterns and the hardness of …

Issues of data governance associated with data mining in medical research: experiences from an empirical study

J Nahar, T Imam, KS Tickle… - … Governance in a …, 2013 - ebooks.iospress.nl
This chapter is a review of data mining techniques used in medical research. It will cover the
existing applications of these techniques in the identification of diseases, and also present …

[PDF][PDF] Calidad de datos y aprendizaje automático: detección de errores semánticos en datos estructurados con esquema desconocido

AD Lentini - 2021 - ri.itba.edu.ar
Resumen" El presente trabajo tiene como objetivo general evaluar si técnicas del
aprendizaje automático provenientes del área del procesamiento natural del lenguaje …

Continuous Data Integration for Land Use and Transportation Planning and Modeling

L Wang, K Kim - 2014 - pdxscholar.library.pdx.edu
There is an urgent need for improved models that address the interdependencies between
land use and transportation, and considerable new work is underway to develop such …

[图书][B] Utility of Considering Multiple Alternative Rectifications in Data Cleaning

PIS Rihan - 2013 - search.proquest.com
Most data cleaning systems aim to go from a given deterministic dirty database to another
deterministic but clean database. Such an enterprise pre-supposes that it is in fact possible …

[引用][C] 基于可能世界模型的关系数据不一致性的修复

徐耀丽, 李战怀, 陈群, 钟评 - 软件学报, 2016