A review of the F-measure: its history, properties, criticism, and alternatives

P Christen, DJ Hand, N Kirielle - ACM Computing Surveys, 2023 - dl.acm.org
Methods to classify objects into two or more classes are at the core of various disciplines.
When a set of objects with their true classes is available, a supervised classifier can be …

Convergence Diagnostics for Entity Resolution

S Aleshin-Guendel, RC Steorts - Annual Review of Statistics …, 2024 - annualreviews.org
Entity resolution is the process of merging and removing duplicate records from multiple
data sources, often in the absence of unique identifiers. Bayesian models for entity …

[HTML][HTML] Thirty-three myths and misconceptions about population data: from data capture and processing to linkage

P Christen, R Schnell - International Journal of Population Data …, 2023 - ncbi.nlm.nih.gov
Databases covering all individuals of a population are increasingly used for research and
decision-making. The massive size of such databases is often mistaken as a guarantee for …

A knowledge graph embeddings based approach for author name disambiguation using literals

C Santini, GA Gesese, S Peroni, A Gangemi, H Sack… - Scientometrics, 2022 - Springer
Scholarly data is growing continuously containing information about the articles from a
plethora of venues including conferences, journals, etc. Many initiatives have been taken to …

Deep Learning for Economists

M Dell - 2024 - nber.org
Deep learning provides powerful methods to impute structured information from large-scale,
unstructured text and image datasets. For example, economists might wish to detect the …

An analysis of one-to-one matching algorithms for entity resolution

G Papadakis, V Efthymiou, E Thanos, O Hassanzadeh… - The VLDB Journal, 2023 - Springer
Entity resolution (ER) is the task of finding records that refer to the same real-world entities. A
common scenario, which we refer to as Clean-Clean ER, is to resolve records across two …

CHORUS: A New Dataset of State Interest Group Policy Positions in the United States

G Hall, JA Basseches, R Bromley-Trujillo… - State Politics & Policy …, 2024 - cambridge.org
Research on the activities and influence of interest groups in state legislatures faces a data
problem: we are missing a comprehensive, systematic dataset of interest groups' policy …

[HTML][HTML] Using machine learning to link electronic health records in cancer registries: On the tradeoff between linkage quality and manual effort

P Röchner, F Rothlauf - International Journal of Medical Informatics, 2024 - Elsevier
Background Cancer registries link a large number of electronic health records reported by
medical institutions to already registered records of the matching individual and tumor …

Bayesian record linkage with variables in one file

G Kamat, M Shan, R Gutman - Statistics in Medicine, 2023 - Wiley Online Library
In many healthcare and social science applications, information about units is dispersed
across multiple data files. Linking records across files is necessary to estimate the …

A Primer on the Data Cleaning Pipeline

RC Steorts - Journal of Survey Statistics and Methodology, 2023 - academic.oup.com
The availability of both structured and unstructured databases, such as electronic health
data, social media data, patent data, and surveys that are often updated in real time, among …