String similarity search and join: a survey

M Yu, G Li, D Deng, J Feng - Frontiers of Computer Science, 2016 - Springer
String similarity search and join are two important operations in data cleaning and
integration, which extend traditional exact search and exact join operations in databases by …

[图书][B] The data matching process

P Christen, P Christen - 2012 - Springer
This chapter provides an overview of the data matching process, and describes the five
major steps involved in this process: data pre-processing (cleaning and standardisation) …

Data-Centric Systems and Applications

MJ Carey, S Ceri, P Bernstein, U Dayal, C Faloutsos… - Italy: Springer, 2006 - Springer
The rapid growth of the Web in the past two decades has made it the largest publicly
accessible data source in the world. Web mining aims to discover useful information or …

A comparison of personal name matching: Techniques and practical issues

P Christen - Sixth IEEE International Conference on Data …, 2006 - ieeexplore.ieee.org
Finding and matching personal names is at the core of an increasing number of
applications: from text and Web mining, search engines, to information extraction …

[PDF][PDF] MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive

MN Bernstein, AH Doan, CN Dewey - Bioinformatics, 2017 - academic.oup.com
Abstract Motivation The NCBI's Sequence Read Archive (SRA) promises great biological
insight if one could analyze the data in the aggregate; however, the data remain largely …

SJFuzz: Seed and Mutator Scheduling for JVM Fuzzing

M Wu, Y Ouyang, M Lu, J Chen, Y Zhao, H Cui… - Proceedings of the 31st …, 2023 - dl.acm.org
While the Java Virtual Machine (JVM) plays a vital role in ensuring correct executions of
Java applications, testing JVMs via generating and running class files on them can be rather …

Fast business process similarity search with feature-based similarity estimation

Z Yan, R Dijkman, P Grefen - … International Conferences" On the Move to …, 2010 - Springer
Nowadays, business process management plays an important role in the management of
organizations. More and more organizations describe their operations as business …

A Comparison of String Similarity Measures for Toponym Matching

G Recchia, M Louwerse - 2013 - dl.acm.org
The diversity of ways in which toponyms are specified often results in mismatches between
queries and the place names contained in gazetteers. Search terms that include unofficial …

Fast business process similarity search

Z Yan, R Dijkman, P Grefen - Distributed and Parallel Databases, 2012 - Springer
Nowadays, it is common for organizations to maintain collections of hundreds or even
thousands of business processes. Techniques exist to search through such a collection, for …

[图书][B] Febrl-Freely extensible biomedical record linkage

P Christen, T Churches - 2002 - cs.anu.edu.au
This manual describes prototype software called Febrl designed to undertake probabilistic
data cleaning and standardisation, deduplication and record linkage. Written in the Python …