Model selection for contextual bandits

A Zhang, ZC Lipton, M Li, AJ Smola - arXiv preprint arXiv:2106.11342, 2021 - arxiv.org

This open-source book represents our attempt to make deep learning approachable,
teaching readers the concepts, the context, and the code. The entire book is drafted in …

被引用次数：1122 相关文章所有 9 个版本

[PDF] mlr.press

Beyond ucb: Optimal and efficient contextual bandits with regression oracles

D Foster, A Rakhlin - International Conference on Machine …, 2020 - proceedings.mlr.press

A fundamental challenge in contextual bandits is to develop flexible, general-purpose
algorithms with computational requirements no worse than classical supervised learning …

被引用次数：208 相关文章所有 6 个版本

[PDF] mlr.press

Sample complexity of reinforcement learning using linearly combined model ensembles

A Modi, N Jiang, A Tewari… - … Conference on Artificial …, 2020 - proceedings.mlr.press

Reinforcement learning (RL) methods have been shown to be capable of learning intelligent
behavior in rich domains. However, this has largely been done in simulated domains without …

被引用次数：161 相关文章所有 7 个版本

[PDF] neurips.cc

Adapting to misspecification in contextual bandits

DJ Foster, C Gentile, M Mohri… - Advances in Neural …, 2020 - proceedings.neurips.cc

A major research direction in contextual bandits is to develop algorithms that are
computationally efficient, yet support flexible, general-purpose function approximation …

被引用次数：102 相关文章所有 9 个版本

[PDF] mlr.press

A model selection approach for corruption robust reinforcement learning

CY Wei, C Dann, J Zimmert - International Conference on …, 2022 - proceedings.mlr.press

We develop a model selection approach to tackle reinforcement learning with adversarial
corruption in both transition and reward. For finite-horizon tabular MDPs, without prior …

被引用次数：53 相关文章所有 6 个版本

[PDF] neurips.cc

Model selection in contextual stochastic bandit problems

A Pacchiano, M Phan… - Advances in …, 2020 - proceedings.neurips.cc

We study bandit model selection in stochastic environments. Our approach relies on a
master algorithm that selects between candidate base algorithms. We develop a master …

被引用次数：105 相关文章所有 8 个版本

[PDF] arxiv.org

Hedging the drift: Learning to optimize under nonstationarity

WC Cheung, D Simchi-Levi, R Zhu - Management Science, 2022 - pubsonline.informs.org

We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic
regret bounds for a collection of nonstationary stochastic bandit settings. These settings …

被引用次数：114 相关文章所有 11 个版本

[PDF] neurips.cc

Oracle inequalities for model selection in offline reinforcement learning

JN Lee, G Tucker, O Nachum, B Dai… - Advances in Neural …, 2022 - proceedings.neurips.cc

In offline reinforcement learning (RL), a learner leverages prior logged data to learn a good
policy without interacting with the environment. A major challenge in applying such methods …

被引用次数：15 相关文章所有 8 个版本

[PDF] mlr.press

Dynamic balancing for model selection in bandits and rl

A Cutkosky, C Dann, A Das, C Gentile… - International …, 2021 - proceedings.mlr.press

We propose a framework for model selection by combining base algorithms in stochastic
bandits and reinforcement learning. We require a candidate regret bound for each base …

被引用次数：38 相关文章所有 3 个版本

[PDF] springer.com

Artificial intelligence for materials research at extremes

B Maruyama, J Hattrick-Simpers, W Musinski… - MRS Bulletin, 2022 - Springer

Materials development is slow and expensive, taking decades from inception to fielding. For
materials research at extremes, the situation is even more demanding, as the desired …

被引用次数：6 相关文章所有 4 个版本

高级搜索

QQ 群