Statistical Data Integration for Health Policy Evidence-Building

SM Paddock, C Franco, FJ Breidt… - Annual Review of …, 2024 - annualreviews.org
Health policy evidence-building requires data sources such as health care claims, electronic
health records, probability and nonprobability survey data, epidemiological surveillance …

Convergence Diagnostics for Entity Resolution

S Aleshin-Guendel, RC Steorts - Annual Review of Statistics …, 2024 - annualreviews.org
Entity resolution is the process of merging and removing duplicate records from multiple
data sources, often in the absence of unique identifiers. Bayesian models for entity …

Conditional partial exchangeability: a probabilistic framework for multi-view clustering

B Franzolini, M De Iorio, J Eriksson - arXiv preprint arXiv:2307.01152, 2023 - arxiv.org
Standard clustering techniques assume a common configuration for all features in a dataset.
However, when dealing with multi-view or longitudinal data, the clusters' number …

Multifile partitioning for record linkage and duplicate detection

S Aleshin-Guendel, M Sadinle - Journal of the American Statistical …, 2023 - Taylor & Francis
Merging datafiles containing information on overlapping sets of entities is a challenging task
in the absence of unique identifiers, and is further complicated when some entities are …

A unified framework for de-duplication and population size estimation (with discussion)

A Tancredi, R Steorts, B Liseo - 2020 - projecteuclid.org
A Unified Framework for De-Duplication and Population Size Estimation (with Discussion)
Page 1 Bayesian Analysis (2020) 15, Number 2, pp. 633–682 A Unified Framework for De-Duplication …

Cohesion and repulsion in Bayesian distance clustering

A Natarajan, M De Iorio, A Heinecke… - Journal of the …, 2024 - Taylor & Francis
Clustering in high-dimensions poses many statistical challenges. While traditional distance-
based clustering methods are computationally feasible, they lack probabilistic interpretation …

Graph-aligned random partition model (GARP)

G Rebaudo, P Müller - Journal of the American Statistical …, 2024 - Taylor & Francis
Bayesian nonparametric mixtures and random partition models are powerful tools for
probabilistic clustering. However, standard independent mixture models can be restrictive in …

A Primer on the Data Cleaning Pipeline

RC Steorts - Journal of Survey Statistics and Methodology, 2023 - academic.oup.com
The availability of both structured and unstructured databases, such as electronic health
data, social media data, patent data, and surveys that are often updated in real time, among …

Bayesian estimation of cluster covariance matrices of unknown form

D Creal, J Kim - Journal of Econometrics, 2024 - Elsevier
We develop a flexible Bayesian model for cluster covariance matrices in large dimensions
where the number of clusters and the assignment of cross-sectional units to a cluster are a …

A prior for record linkage based on allelic partitions

B Betancourt, J Sosa, A Rodríguez - Computational Statistics & Data …, 2022 - Elsevier
In database management, record linkage aims to identify multiple records that correspond to
the same individual. Record linkage can be treated as a clustering problem in which one or …