Inference for batched bandits

K Zhang, L Janson, S Murphy - Advances in neural …, 2020 - proceedings.neurips.cc
As bandit algorithms are increasingly utilized in scientific studies and industrial applications,
there is an associated increasing need for reliable inference methods based on the resulting …

Off-policy evaluation via adaptive weighting with data from contextual bandits

R Zhan, V Hadad, DA Hirshberg, S Athey - Proceedings of the 27th ACM …, 2021 - dl.acm.org
It has become increasingly common for data to be collected adaptively, for example using
contextual bandits. Historical data of this type can be used to evaluate other treatment …

[HTML][HTML] Response-adaptive randomization in clinical trials: from myths to practical considerations

DS Robertson, KM Lee… - Statistical science: a …, 2023 - ncbi.nlm.nih.gov
Abstract Response-Adaptive Randomization (RAR) is part of a wider class of data-
dependent sampling algorithms, for which clinical trials are typically used as a motivating …

Policy learning with adaptively collected data

R Zhan, Z Ren, S Athey, Z Zhou - Management Science, 2024 - pubsonline.informs.org
In a wide variety of applications, including healthcare, bidding in first price auctions, digital
recommendations, and online education, it can be beneficial to learn a policy that assigns …

A closer look at the worst-case behavior of multi-armed bandit algorithms

A Kalvit, A Zeevi - Advances in Neural Information …, 2021 - proceedings.neurips.cc
One of the key drivers of complexity in the classical (stochastic) multi-armed bandit (MAB)
problem is the difference between mean rewards in the top two arms, also known as the …

Metalearning linear bandits by prior update

A Peleg, N Pearl, R Meir - International Conference on …, 2022 - proceedings.mlr.press
Fully Bayesian approaches to sequential decision-making assume that problem parameters
are generated from a known prior. In practice, such information is often lacking. This problem …

A unified framework for bandit multiple testing

Z Xu, R Wang, A Ramdas - Advances in Neural Information …, 2021 - proceedings.neurips.cc
In bandit multiple hypothesis testing, each arm corresponds to a different null hypothesis that
we wish to test, and the goal is to design adaptive algorithms that correctly identify large set …

Near-optimal inference in adaptive linear regression

K Khamaru, Y Deshpande, T Lattimore… - arXiv preprint arXiv …, 2021 - arxiv.org
When data is collected in an adaptive manner, even simple methods like ordinary least
squares can exhibit non-normal asymptotic behavior. As an undesirable consequence …

Adaptive linear estimating equations

M Ying, K Khamaru, CH Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc
Sequential data collection has emerged as a widely adopted technique for enhancing the
efficiency of data gathering processes. Despite its advantages, such data collection …

Safe optimal design with applications in off-policy learning

R Zhu, B Kveton - International Conference on Artificial …, 2022 - proceedings.mlr.press
Motivated by practical needs in online experimentation and off-policy learning, we study the
problem of safe optimal design, where we develop a data logging policy that efficiently …