An overview of end-to-end entity resolution for big data

V Christophides, V Efthymiou, T Palpanas… - ACM Computing …, 2020 - dl.acm.org
One of the most critical tasks for improving data quality and increasing the reliability of data
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …

Data cleaning: Overview and emerging challenges

X Chu, IF Ilyas, S Krishnan, J Wang - Proceedings of the 2016 …, 2016 - dl.acm.org
Detecting and repairing dirty data is one of the perennial challenges in data analytics, and
failure to do so can result in inaccurate analytics and unreliable decisions. Over the past few …

Can foundation models wrangle your data?

A Narayan, I Chami, L Orr, S Arora, C Ré - arXiv preprint arXiv:2205.09911, 2022 - arxiv.org
Foundation Models (FMs) are models trained on large corpora of data that, at very large
scale, can generalize to new tasks without any task-specific finetuning. As these models …

A survey on data collection for machine learning: a big data-ai integration perspective

Y Roh, G Heo, SE Whang - IEEE Transactions on Knowledge …, 2019 - ieeexplore.ieee.org
Data collection is a major bottleneck in machine learning and an active research topic in
multiple communities. There are largely two reasons data collection has recently become a …

Deep entity matching with pre-trained language models

Y Li, J Li, Y Suhara, AH Doan, WC Tan - arXiv preprint arXiv:2004.00584, 2020 - arxiv.org
We present Ditto, a novel entity matching system based on pre-trained Transformer-based
language models. We fine-tune and cast EM as a sequence-pair classification problem to …

Data-centric artificial intelligence: A survey

D Zha, ZP Bhat, KH Lai, F Yang, Z Jiang… - arXiv preprint arXiv …, 2023 - arxiv.org
Artificial Intelligence (AI) is making a profound impact in almost every domain. A vital enabler
of its great success is the availability of abundant and high-quality data for building machine …

Deep learning for entity matching: A design space exploration

S Mudgal, H Li, T Rekatsinas, AH Doan… - Proceedings of the …, 2018 - dl.acm.org
Entity matching (EM) finds data instances that refer to the same real-world entity. In this
paper we examine applying deep learning (DL) to EM, to understand DL's benefits and …

[图书][B] Data cleaning

IF Ilyas, X Chu - 2019 - books.google.com
This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …

DeepER--Deep Entity Resolution

M Ebraheem, S Thirumuruganathan, S Joty… - arXiv preprint arXiv …, 2017 - arxiv.org
Entity resolution (ER) is a key data integration problem. Despite the efforts in 70+ years in all
aspects of ER, there is still a high demand for democratizing ER-humans are heavily …

Truth inference in crowdsourcing: Is the problem solved?

Y Zheng, G Li, Y Li, C Shan, R Cheng - Proceedings of the VLDB …, 2017 - dl.acm.org
Crowdsourcing has emerged as a novel problem-solving paradigm, which facilitates
addressing problems that are hard for computers, eg, entity resolution and sentiment …