Dive into deep learning

A Zhang, ZC Lipton, M Li, AJ Smola - arXiv preprint arXiv:2106.11342, 2021 - arxiv.org
This open-source book represents our attempt to make deep learning approachable,
teaching readers the concepts, the context, and the code. The entire book is drafted in …

Beyond ucb: Optimal and efficient contextual bandits with regression oracles

D Foster, A Rakhlin - International Conference on Machine …, 2020 - proceedings.mlr.press
A fundamental challenge in contextual bandits is to develop flexible, general-purpose
algorithms with computational requirements no worse than classical supervised learning …

Sample complexity of reinforcement learning using linearly combined model ensembles

A Modi, N Jiang, A Tewari… - … Conference on Artificial …, 2020 - proceedings.mlr.press
Reinforcement learning (RL) methods have been shown to be capable of learning intelligent
behavior in rich domains. However, this has largely been done in simulated domains without …

Adapting to misspecification in contextual bandits

DJ Foster, C Gentile, M Mohri… - Advances in Neural …, 2020 - proceedings.neurips.cc
A major research direction in contextual bandits is to develop algorithms that are
computationally efficient, yet support flexible, general-purpose function approximation …

A model selection approach for corruption robust reinforcement learning

CY Wei, C Dann, J Zimmert - International Conference on …, 2022 - proceedings.mlr.press
We develop a model selection approach to tackle reinforcement learning with adversarial
corruption in both transition and reward. For finite-horizon tabular MDPs, without prior …

Model selection in contextual stochastic bandit problems

A Pacchiano, M Phan… - Advances in …, 2020 - proceedings.neurips.cc
We study bandit model selection in stochastic environments. Our approach relies on a
master algorithm that selects between candidate base algorithms. We develop a master …

Hedging the drift: Learning to optimize under nonstationarity

WC Cheung, D Simchi-Levi, R Zhu - Management Science, 2022 - pubsonline.informs.org
We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic
regret bounds for a collection of nonstationary stochastic bandit settings. These settings …

Oracle inequalities for model selection in offline reinforcement learning

JN Lee, G Tucker, O Nachum, B Dai… - Advances in Neural …, 2022 - proceedings.neurips.cc
In offline reinforcement learning (RL), a learner leverages prior logged data to learn a good
policy without interacting with the environment. A major challenge in applying such methods …

Dynamic balancing for model selection in bandits and rl

A Cutkosky, C Dann, A Das, C Gentile… - International …, 2021 - proceedings.mlr.press
We propose a framework for model selection by combining base algorithms in stochastic
bandits and reinforcement learning. We require a candidate regret bound for each base …

Artificial intelligence for materials research at extremes

B Maruyama, J Hattrick-Simpers, W Musinski… - MRS Bulletin, 2022 - Springer
Materials development is slow and expensive, taking decades from inception to fielding. For
materials research at extremes, the situation is even more demanding, as the desired …