It has become increasingly common for data to be collected adaptively, for example using contextual bandits. Historical data of this type can be used to evaluate other treatment …
DS Robertson, KM Lee… - Statistical science: a …, 2023 - ncbi.nlm.nih.gov
Abstract Response-Adaptive Randomization (RAR) is part of a wider class of data- dependent sampling algorithms, for which clinical trials are typically used as a motivating …
A Kalvit, A Zeevi - Advances in Neural Information …, 2021 - proceedings.neurips.cc
One of the key drivers of complexity in the classical (stochastic) multi-armed bandit (MAB) problem is the difference between mean rewards in the top two arms, also known as the …
Data bias is a major concern in biomedical research, especially when evaluating large-scale observational datasets. It leads to imprecise predictions and inconsistent estimates in …
Z Xu, R Wang, A Ramdas - Advances in Neural Information …, 2021 - proceedings.neurips.cc
In bandit multiple hypothesis testing, each arm corresponds to a different null hypothesis that we wish to test, and the goal is to design adaptive algorithms that correctly identify large set …
We demonstrate that a wide array of machine learning algorithms are specific instances of one single paradigm: reciprocal learning. These instances range from active learning over …
Entropy regularization is known to improve exploration in sequential decision-making problems. We show that this same mechanism can also lead to nearly unbiased and lower …
Y Park, N Yoshida - arXiv preprint arXiv:2304.04170, 2023 - arxiv.org
In bandit algorithms, the randomly time-varying adaptive experimental design makes it difficult to apply traditional limit theorems to off-policy evaluation of the treatment effect …
Applications of reinforcement learning (RL) for supporting, managing and improving decision-making are becoming increasingly popular in a variety of medicine and healthcare …