An overview of end-to-end entity resolution for big data

V Christophides, V Efthymiou, T Palpanas… - ACM Computing …, 2020 - dl.acm.org
One of the most critical tasks for improving data quality and increasing the reliability of data
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …

Framework for evaluating clustering algorithms in duplicate detection

O Hassanzadeh, F Chiang, HC Lee… - Proceedings of the VLDB …, 2009 - dl.acm.org
The presence of duplicate records is a major data quality concern in large databases. To
detect duplicates, entity resolution also known as duplication detection or record linkage is …

Three-dimensional entity resolution with JedAI

G Papadakis, G Mandilaras, L Gagliardelli… - Information Systems, 2020 - Elsevier
Entity Resolution (ER) is the task of detecting different entity profiles that describe the same
real-world objects. To facilitate its execution, we have developed JedAI, an open-source …

Linking temporal records

P Li, XL Dong, A Maurino, D Srivastava - Proceedings of the VLDB …, 2011 - dl.acm.org
Many data sets contain temporal records over a long period of time; each record is
associated with a time stamp and describes some aspects of a real-world entity at that …

End-to-end entity resolution for big data: A survey

V Christophides, V Efthymiou, T Palpanas… - arXiv preprint arXiv …, 2019 - arxiv.org
One of the most important tasks for improving data quality and the reliability of data analytics
results is Entity Resolution (ER). ER aims to identify different descriptions that refer to the …

An analysis of one-to-one matching algorithms for entity resolution

G Papadakis, V Efthymiou, E Thanos, O Hassanzadeh… - The VLDB Journal, 2023 - Springer
Entity resolution (ER) is the task of finding records that refer to the same real-world entities. A
common scenario, which we refer to as Clean-Clean ER, is to resolve records across two …

Bipartite graph matching algorithms for clean-clean entity resolution: an empirical evaluation

G Papadakis, V Efthymiou, E Thanos… - arXiv preprint arXiv …, 2021 - arxiv.org
Entity Resolution (ER) is the task of finding records that refer to the same real-world entities.
A common scenario is when entities across two clean sources need to be resolved, which …

Extended affinity propagation clustering for multi-source entity resolution

S Lerm, A Saeedi, E Rahm - 2021 - dl.gi.de
Entity resolution is the data integration task of identifying matching entities (eg products,
customers) in one or several data sources. Previous approaches for matching and clustering …

SuperPart: Supervised graph partitioning for record linkage

R Reas, S Ash, R Barton… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
Identifying sets of items that are equivalent to one another is a problem common to many
fields. Systems addressing this generally have at their core a function s (d_i, d_j) for …

Leveraging social media signals for record linkage

AT Schneider, A Mukherjee, EC Dragut - … of the 2018 World Wide Web …, 2018 - dl.acm.org
Many data-intensive applications collect (structured) data from a variety of sources. A key
task in this process is record linkage, which is the problem of determining the records from …