Data cleansing mechanisms and approaches for big data analytics: a systematic study

M Hosseinzadeh, E Azhir, OH Ahmed… - Journal of Ambient …, 2023 - Springer
With the evolution of new technologies, the production of digital data is constantly growing. It
is thus necessary to develop data management strategies in order to handle the large-scale …

Towards a data quality framework for heterogeneous data

N Micic, D Neagu, F Campean… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Every industry has significant data output as a product of their working process, and with the
recent advent of big data mining and integrated data warehousing it is the case for a robust …

Horizon: Scalable dependency-driven data cleaning

EK Rezig, M Ouzzani, WG Aref… - Proceedings of the …, 2021 - dl.acm.org
A large class of data repair algorithms rely on integrity constraints to detect and repair errors.
A well-studied class of constraints is Functional Dependencies (FDs, for short). Although …

Enriching data imputation under similarity rule constraints

S Song, Y Sun, A Zhang, L Chen… - IEEE transactions on …, 2018 - ieeexplore.ieee.org
Incomplete information often occurs along with many database applications, eg, in data
integration, data cleaning, or data exchange. The idea of data imputation is often to fill the …

PClean: Bayesian data cleaning at scale with domain-specific probabilistic programming

A Lew, M Agrawal, D Sontag… - … conference on artificial …, 2021 - proceedings.mlr.press
Data cleaning is naturally framed as probabilistic inference in a generative model of ground-
truth data and likely errors, but the diversity of real-world error patterns and the hardness of …

Locater: cleaning wifi connectivity datasets for semantic localization

Y Lin, D Jiang, R Yus, G Bouloukakis, A Chio… - arXiv preprint arXiv …, 2020 - arxiv.org
This paper explores the data cleaning challenges that arise in using WiFi connectivity data
to locate users to semantic indoor locations such as buildings, regions, rooms. WiFi …

A functional taxonomy of data quality tools: Insights from science and practice

M Altendeitering, M Tomczyk - 2022 - aisel.aisnet.org
For organizations data quality is a prerequisite for automated decision making and agility. To
provide high quality data, numerous tools have emerged that support the different steps of …

Monitoring IoT networks for botnet activity

SS Chawathe - 2018 IEEE 17th International Symposium on …, 2018 - ieeexplore.ieee.org
The Internet of Things (IoT) has rapidly transitioned from a novelty to a common, and often
critical, part of residential, business, and industrial environments. Security vulnerabilities and …

Machine learning to data management: A round trip

BE Laure, B Angela, M Tova - 2018 IEEE 34th International …, 2018 - ieeexplore.ieee.org
With the emergence of machine learning (ML) techniques in database research, ML has
already proved a tremendous potential to dramatically impact the foundations, algorithms …

An efficient learning based approach for automatic record deduplication with benchmark datasets

M Ravikanth, S Korra, G Mamidisetti, M Goutham… - Scientific Reports, 2024 - nature.com
With technological innovations, enterprises in the real world are managing every iota of data
as it can be mined to derive business intelligence (BI). However, when data comes from …