Causal discovery in heterogeneous environments under the sparse mechanism shift hypothesis

R Perry, J Von Kügelgen… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract Machine learning approaches commonly rely on the assumption of independent
and identically distributed (iid) data. In reality, however, this assumption is almost always …

Eliminating accidental deviations to minimize generalization error and maximize replicability: Applications in connectomics and genomics

EW Bridgeford, S Wang, Z Wang, T Xu… - PLoS computational …, 2021 - journals.plos.org
Replicability, the ability to replicate scientific findings, is a prerequisite for scientific discovery
and clinical utility. Troublingly, we are in the midst of a replicability crisis. A key to …

Learning sources of variability from high-dimensional observational studies

EW Bridgeford, J Chung, B Gilbert, S Panda… - arXiv preprint arXiv …, 2023 - arxiv.org
Causal inference studies whether the presence of a variable influences an observed
outcome. As measured by quantities such as the" average treatment effect," this paradigm is …

A regression perspective on generalized distance covariance and the Hilbert–Schmidt independence criterion

D Edelmann, J Goeman - Statistical Science, 2022 - projecteuclid.org
A Regression Perspective on Generalized Distance Covariance and the Hilbert-Schmidt
Independence Criterion Page 1 Statistical Science 2022, Vol. 37, No. 4, 562–579 https://doi.org/10.1214/21-STS841 …

hyppo: A multivariate hypothesis testing Python package

S Panda, S Palaniappan, J Xiong… - arXiv preprint arXiv …, 2019 - arxiv.org
We introduce hyppo, a unified library for performing multivariate hypothesis testing,
including independence, two-sample, and k-sample testing. While many multivariate …

Valid two‐sample graph testing via optimal transport procrustes and multiscale graph correlation with applications in connectomics

J Chung, B Varjavand, J Arroyo‐Relión, A Alyakin… - Stat, 2022 - Wiley Online Library
Testing whether two graphs come from the same distribution is of interest in many real‐world
scenarios, including brain network analysis. Under the random dot product graph model, the …

Goodness-of-Fit and Clustering of Spherical Data: the QuadratiK package in R and Python

G Saraceno, M Markatou, R Mukhopadhyay… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce the QuadratiK package that incorporates innovative data analysis
methodologies. The presented software, implemented in both R and Python, offers a …

Private measurement of nonlinear correlations between data hosted across multiple parties

P Vepakomma, SN Pushpita, R Raskar - arXiv preprint arXiv:2110.09670, 2021 - arxiv.org
We introduce a differentially private method to measure nonlinear correlations between
sensitive data hosted across two entities. We provide utility guarantees of our private …

A Kernel Measure of Dissimilarity between M Distributions

Z Huang, B Sen - Journal of the American Statistical Association, 2024 - Taylor & Francis
Given M≥ 2 distributions defined on a general measurable space, we introduce a
nonparametric (kernel) measure of multi-sample dissimilarity (KMD)—a parameter that …

Energy distance and kernel mean embeddings for two-sample survival testing

M Matabuena, OHM Padilla - arXiv preprint arXiv:1912.04160, 2019 - arxiv.org
We study the comparison problem of distribution equality between two random samples
under a right censoring scheme. To address this problem, we design a series of tests based …