[图书][B] The data matching process

P Christen, P Christen - 2012 - Springer
This chapter provides an overview of the data matching process, and describes the five
major steps involved in this process: data pre-processing (cleaning and standardisation) …

A survey of indexing techniques for scalable record linkage and deduplication

P Christen - IEEE transactions on knowledge and data …, 2011 - ieeexplore.ieee.org
Record linkage is the process of matching records from several databases that refer to the
same entities. When applied on a single database, this process is known as deduplication …

Record linkage: similarity measures and algorithms

N Koudas, S Sarawagi, D Srivastava - Proceedings of the 2006 ACM …, 2006 - dl.acm.org
This tutorial provides a comprehensive and cohesive overview of the key research results in
the area of record linkage methodologies and algorithms for identifying approximate …

[PDF][PDF] Record linkage: Current practice and future directions

L Gu, R Baxter, D Vickers, C Rainsford - CSIRO Mathematical and …, 2003 - Citeseer
Record linkage is the task of quickly and accurately identifying records corresponding to the
same entity from one or more data sources. Record linkage is also known as data cleaning …

A Bayesian decision model for cost optimal record matching

VS Verykios, GV Moustakides, MG Elfeky - The VLDB Journal, 2003 - Springer
In an error-free system with perfectly clean data, the construction of a global view of the data
consists of linking-in relational terms, joining-two or more tables on their key fields …

Iterative record linkage for cleaning and integration

I Bhattacharya, L Getoor - Proceedings of the 9th ACM SIGMOD …, 2004 - dl.acm.org
Record linkage, the problem of determining when two records refer to the same entity, has
applications for both data cleaning (deduplication) and for integrating data from multiple …

Automating the approximate record-matching process

VS Verykios, AK Elmagarmid, EN Houstis - Information sciences, 2000 - Elsevier
Data quality has many dimensions one of which is accuracy. Accuracy is usually
compromised by errors accidentally or intensionally introduced in a database system. These …

Record linkage methods for multidatabase data mining

V Torra, J Domingo-Ferrer - Information fusion in data mining, 2003 - Springer
Record linkage methods for multidatabase data mining Page 1 Record linkage methods for
multidatabase data mining Vicenc; Torral and Josep Domingo-Ferrer2 1 Institut d'Investigaci6 …

Quality and complexity measures for data linkage and deduplication

P Christen, K Goiser - Quality measures in data mining, 2007 - Springer
Deduplicating one data set or linking several data sets are increasingly important tasks in
the data preparation steps of many data mining projects. The aim of such linkages is to …

Automatic record linkage using seeded nearest neighbour and support vector machine classification

P Christen - Proceedings of the 14th ACM SIGKDD international …, 2008 - dl.acm.org
The task of linking databases is an important step in an increasing number of data mining
projects, because linked data can contain information that is not available otherwise, or that …