Entity matching using large language models

R Peeters, C Bizer - arXiv preprint arXiv:2310.11244, 2023 - arxiv.org
Entity Matching is the task of deciding whether two entity descriptions refer to the same real-
world entity. Entity Matching is a central step in most data integration pipelines and an …

A critical re-evaluation of benchmark datasets for (deep) learning-based matching algorithms

G Papadakis, N Kirielle, P Christen… - arXiv preprint arXiv …, 2023 - arxiv.org
Entity resolution (ER) is the process of identifying records that refer to the same entities
within one or across multiple databases. Numerous techniques have been developed to …

WDC products: A multi-dimensional entity matching benchmark

R Peeters, RC Der, C Bizer - arXiv preprint arXiv:2301.09521, 2023 - arxiv.org
The difficulty of an entity matching task depends on a combination of multiple factors such as
the amount of corner-case pairs, the fraction of entities in the test set that have not been …

VerifAI: Verified Generative AI

N Tang, C Yang, J Fan, L Cao - arXiv preprint arXiv:2307.02796, 2023 - arxiv.org
Generative AI has made significant strides, yet concerns about the accuracy and reliability of
its outputs continue to grow. Such inaccuracies can have serious consequences such as …

Promptem: prompt-tuning for low-resource generalized entity matching

P Wang, X Zeng, L Chen, F Ye, Y Mao, J Zhu… - arXiv preprint arXiv …, 2022 - arxiv.org
Entity Matching (EM), which aims to identify whether two entity records from two relational
tables refer to the same real-world entity, is one of the fundamental problems in data …

Unicorn: A unified multi-tasking model for supporting matching tasks in data integration

J Tu, J Fan, N Tang, P Wang, G Li, X Du, X Jia… - Proceedings of the ACM …, 2023 - dl.acm.org
Data matching-which decides whether two data elements (eg, string, tuple, column, or
knowledge graph entity) are the" same"(aka a match)-is a key concept in data integration …

Better entity matching with transformers through ensembles

JF Low, BCM Fung, P Xiong - Knowledge-Based Systems, 2024 - Elsevier
In this paper, we introduce AttendEM, a framework for entity matching (EM), ie, pairwise
identification of duplicates across databases. Eschewing the prevalent focus on text …

Ceda: learned cardinality estimation with domain adaptation

Z Wang, Q Zeng, N Wang, H Lu, Y Zhang - Proceedings of the VLDB …, 2023 - dl.acm.org
Cardinality Estimation (CE) is a fundamental but critical problem in DBMS query
optimization, while deep learning techniques have made significant breakthroughs in the …

Dader: hands-off entity resolution with domain adaptation

J Tu, X Han, J Fan, N Tang, C Chai, G Li… - Proceedings of the VLDB …, 2022 - dl.acm.org
Entity resolution (ER) is a core data integration problem that identifies pairs of data instances
referring to the same real-world entities, and the state-of-the-art results of ER are achieved …

Haipipe: Combining human-generated and machine-generated pipelines for data preparation

S Chen, N Tang, J Fan, X Yan, C Chai, G Li… - Proceedings of the ACM …, 2023 - dl.acm.org
Data preparation is crucial in achieving optimized results for machine learning (ML).
However, having a good data preparation pipeline is highly non-trivial for ML practitioners …