An overview of end-to-end entity resolution for big data

V Christophides, V Efthymiou, T Palpanas… - ACM Computing …, 2020 - dl.acm.org
One of the most critical tasks for improving data quality and increasing the reliability of data
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …

Big data systems: A software engineering perspective

A Davoudian, M Liu - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
Big Data Systems (BDSs) are an emerging class of scalable software technologies whereby
massive amounts of heterogeneous data are gathered from multiple sources, managed …

Can foundation models wrangle your data?

A Narayan, I Chami, L Orr, S Arora, C Ré - arXiv preprint arXiv:2205.09911, 2022 - arxiv.org
Foundation Models (FMs) are models trained on large corpora of data that, at very large
scale, can generalize to new tasks without any task-specific finetuning. As these models …

Deep entity matching with pre-trained language models

Y Li, J Li, Y Suhara, AH Doan, WC Tan - arXiv preprint arXiv:2004.00584, 2020 - arxiv.org
We present Ditto, a novel entity matching system based on pre-trained Transformer-based
language models. We fine-tune and cast EM as a sequence-pair classification problem to …

Deep learning for blocking in entity matching: a design space exploration

S Thirumuruganathan, H Li, N Tang… - Proceedings of the …, 2021 - dl.acm.org
Entity matching (EM) finds data instances that refer to the same real-world entity. Most EM
solutions perform blocking then matching. Many works have applied deep learning (DL) to …

[图书][B] The four generations of entity resolution

Entity Resolution (ER) lies at the core of data integration and cleaning and, thus, a bulk of
the research examines ways for improving its effectiveness and time efficiency. The initial …

Linking sensitive data

P Christen, T Ranbaduge, R Schnell - Methods and techniques for …, 2020 - Springer
Sensitive personal data are created in many application domains, and there is now an
increasing demand to share, integrate, and link such data within and across organisations in …

RPT: relational pre-trained transformer is almost all you need towards democratizing data preparation

N Tang, J Fan, F Li, J Tu, X Du, G Li, S Madden… - arXiv preprint arXiv …, 2020 - arxiv.org
Can AI help automate human-easy but computer-hard data preparation tasks that burden
data scientists, practitioners, and crowd workers? We answer this question by presenting …

Pre-trained embeddings for entity resolution: an experimental analysis

A Zeakis, G Papadakis, D Skoutas… - Proceedings of the VLDB …, 2023 - dl.acm.org
Many recent works on Entity Resolution (ER) leverage Deep Learning techniques involving
language models to improve effectiveness. This is applied to both main steps of ER, ie …

Cost-effective in-context learning for entity resolution: A design space exploration

M Fan, X Han, J Fan, C Chai, N Tang… - 2024 IEEE 40th …, 2024 - ieeexplore.ieee.org
Entity resolution (ER) is an important data integration task with a wide spectrum of
applications. The state-of-the-art solutions on ER rely on pre-trained language models …