D Foster, A Rakhlin - International Conference on Machine …, 2020 - proceedings.mlr.press
A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning …
A Modi, N Jiang, A Tewari… - … Conference on Artificial …, 2020 - proceedings.mlr.press
Reinforcement learning (RL) methods have been shown to be capable of learning intelligent behavior in rich domains. However, this has largely been done in simulated domains without …
A major research direction in contextual bandits is to develop algorithms that are computationally efficient, yet support flexible, general-purpose function approximation …
CY Wei, C Dann, J Zimmert - International Conference on …, 2022 - proceedings.mlr.press
We develop a model selection approach to tackle reinforcement learning with adversarial corruption in both transition and reward. For finite-horizon tabular MDPs, without prior …
A Pacchiano, M Phan… - Advances in …, 2020 - proceedings.neurips.cc
We study bandit model selection in stochastic environments. Our approach relies on a master algorithm that selects between candidate base algorithms. We develop a master …
We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic regret bounds for a collection of nonstationary stochastic bandit settings. These settings …
In offline reinforcement learning (RL), a learner leverages prior logged data to learn a good policy without interacting with the environment. A major challenge in applying such methods …
We propose a framework for model selection by combining base algorithms in stochastic bandits and reinforcement learning. We require a candidate regret bound for each base …
Materials development is slow and expensive, taking decades from inception to fielding. For materials research at extremes, the situation is even more demanding, as the desired …