Confidence intervals for policy evaluation in adaptive experiments

V Hadad, DA Hirshberg, R Zhan… - Proceedings of the …, 2021 - National Acad Sciences
Adaptive experimental designs can dramatically improve efficiency in randomized trials. But
with adaptively collected data, common estimators based on sample means and inverse …

Off-policy evaluation via adaptive weighting with data from contextual bandits

R Zhan, V Hadad, DA Hirshberg, S Athey - Proceedings of the 27th ACM …, 2021 - dl.acm.org
It has become increasingly common for data to be collected adaptively, for example using
contextual bandits. Historical data of this type can be used to evaluate other treatment …

[HTML][HTML] Response-adaptive randomization in clinical trials: from myths to practical considerations

DS Robertson, KM Lee… - Statistical science: a …, 2023 - ncbi.nlm.nih.gov
Abstract Response-Adaptive Randomization (RAR) is part of a wider class of data-
dependent sampling algorithms, for which clinical trials are typically used as a motivating …

A closer look at the worst-case behavior of multi-armed bandit algorithms

A Kalvit, A Zeevi - Advances in Neural Information …, 2021 - proceedings.neurips.cc
One of the key drivers of complexity in the classical (stochastic) multi-armed bandit (MAB)
problem is the difference between mean rewards in the top two arms, also known as the …

An evaluation of synthetic data augmentation for mitigating covariate bias in health data

L Juwara, A El-Hussuna, K El Emam - Patterns, 2024 - cell.com
Data bias is a major concern in biomedical research, especially when evaluating large-scale
observational datasets. It leads to imprecise predictions and inconsistent estimates in …

A unified framework for bandit multiple testing

Z Xu, R Wang, A Ramdas - Advances in Neural Information …, 2021 - proceedings.neurips.cc
In bandit multiple hypothesis testing, each arm corresponds to a different null hypothesis that
we wish to test, and the goal is to design adaptive algorithms that correctly identify large set …

Reciprocal learning

J Rodemann, C Jansen, G Schollmeyer - arXiv preprint arXiv:2408.06257, 2024 - arxiv.org
We demonstrate that a wide array of machine learning algorithms are specific instances of
one single paradigm: reciprocal learning. These instances range from active learning over …

Entropy regularization for population estimation

B Chugg, P Henderson, J Goldin, DE Ho - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Entropy regularization is known to improve exploration in sequential decision-making
problems. We show that this same mechanism can also lead to nearly unbiased and lower …

Asymptotic expansion for batched bandits

Y Park, N Yoshida - arXiv preprint arXiv:2304.04170, 2023 - arxiv.org
In bandit algorithms, the randomly time-varying adaptive experimental design makes it
difficult to apply traditional limit theorems to off-policy evaluation of the treatment effect …

Reinforcement learning in modern biostatistics: benefits, challenges and new proposals

N Deliu - 2021 - iris.uniroma1.it
Applications of reinforcement learning (RL) for supporting, managing and improving
decision-making are becoming increasingly popular in a variety of medicine and healthcare …