Data cleaning: Overview and emerging challenges

X Chu, IF Ilyas, S Krishnan, J Wang - Proceedings of the 2016 …, 2016 - dl.acm.org
Detecting and repairing dirty data is one of the perennial challenges in data analytics, and
failure to do so can result in inaccurate analytics and unreliable decisions. Over the past few …

A review on data cleansing methods for big data

F Ridzuan, WMNW Zainon - Procedia Computer Science, 2019 - Elsevier
Massive amounts of data are available for the organization which will influence their
business decision. Data collected from the various resources are dirty and this will affect the …

“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI

N Sambasivan, S Kapania, H Highfill… - proceedings of the …, 2021 - dl.acm.org
AI models are increasingly applied in high-stakes domains like health and conservation.
Data quality carries an elevated significance in high-stakes AI due to its heightened …

[PDF][PDF] 数据质量和数据清洗研究综述

郭志懋, 周傲英 - 软件学报, 2002 - jos.org.cn
对数据质量, 尤其是数据清洗的研究进行了综述. 首先说明数据质量的重要性和衡量指标,
定义了数据清洗问题. 然后对数据清洗问题进行分类, 并分析了解决这些问题的途径 …

A survey on data collection for machine learning: a big data-ai integration perspective

Y Roh, G Heo, SE Whang - IEEE Transactions on Knowledge …, 2019 - ieeexplore.ieee.org
Data collection is a major bottleneck in machine learning and an active research topic in
multiple communities. There are largely two reasons data collection has recently become a …

Benchmark and survey of automated machine learning frameworks

MA Zöller, MF Huber - Journal of artificial intelligence research, 2021 - jair.org
Abstract Machine learning (ML) has become a vital part in many aspects of our daily life.
However, building well performing machine learning applications requires highly …

[图书][B] Data cleaning

IF Ilyas, X Chu - 2019 - books.google.com
This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …

How data science workers work with data: Discovery, capture, curation, design, creation

M Muller, I Lange, D Wang, D Piorkowski… - Proceedings of the …, 2019 - dl.acm.org
With the rise of big data, there has been an increasing need for practitioners in this space
and an increasing opportunity for researchers to understand their workflows and design new …

[图书][B] Data mining: the textbook

CC Aggarwal - 2015 - Springer
This textbook explores the different aspects of data mining from the fundamentals to the
complex data types and their applications, capturing the wide diversity of problem domains …

Sherlock: A deep learning approach to semantic data type detection

M Hulsebos, K Hu, M Bakker, E Zgraggen… - Proceedings of the 25th …, 2019 - dl.acm.org
Correctly detecting the semantic type of data columns is crucial for data science tasks such
as automated data cleaning, schema matching, and data discovery. Existing data …