A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make decisions over time under uncertainty. An enormous body of work has accumulated over the …
This paper studies systematic exploration for reinforcement learning (RL) with rich observations and function approximation. We introduce contextual decision processes …
D Foster, A Rakhlin - International Conference on Machine …, 2020 - proceedings.mlr.press
A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning …
Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option …
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of K\emphactions in response to the observed\emphcontext, and …
Multi-armed bandit problems are the predominant theoretical model of exploration- exploitation tradeoffs in learning, and they have countless applications ranging from medical …
Standard experimental designs are geared toward point estimation and hypothesis testing, while bandit algorithms are geared toward in‐sample outcomes. Here, we instead consider …
A major research direction in contextual bandits is to develop algorithms that are computationally efficient, yet support flexible, general-purpose function approximation …
How can we take advantage of opportunities for experimental parallelization in explorationexploitation tradeoffs? In many experimental scenarios, it is often desirable to …