Simple regret minimization for contextual bandits- 学术资源搜索

Simple regret minimization for contextual bandits

AA Deshmukh, S Sharma, JW Cutler… - arXiv preprint arXiv …, 2018 - arxiv.org

… bounding Simple regret for contextual bandits is … simple regret minimization in the contextual
bandit setting. We propose the Contextual-Gap algorithm, give a regret bound for the simple …

被引用次数：23 相关文章

[PDF] github.io

[PDF][PDF] Simple regret minimization for contextual bandits using bayesian optimal experimental design

M Jörke, J Lee, E Brunskill - … Design and Active Learning in the …, 2022 - realworldml.github.io

… We consider a stochastic contextual bandit model where each context s ∈ S is independently
sampled from a distribution ρ. We assume that ρ is known or that it can be approximated …

被引用次数：5 相关文章

[PDF] neurips.cc

Proportional response: Contextual bandits for simple and cumulative regret minimization

SK Krishnamurthy, R Zhan, S Athey… - Advances in Neural …, 2023 - proceedings.neurips.cc

… efficient bandit algorithms for the stochastic contextual bandit setting, … regret minimization
(where we establish near-optimal minimax guarantees) versus simple regret minimization (…

被引用次数：10 相关文章所有 6 个版本

[PDF] mlr.press

Contexts can be cheap: Solving stochastic contextual bandits with linear bandit algorithms

OA Hanna, L Yang, C Fragouli - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

… space (context) At. That is, we can think of linear bandits as single-context contextual bandits,
… For example, while linear bandits are used in recommendation systems where the set of …

被引用次数：14 相关文章所有 5 个版本

[PDF] neurips.cc

Instance-optimal pac algorithms for contextual bandits

Z Li, L Ratliff, KG Jamieson… - Advances in Neural …, 2022 - proceedings.neurips.cc

… Minimax regret bounds for general policy classes The vast majority of research in contextual
bandits focuses on regret minimization. That is, for a time horizon T, the goal of the player is …

被引用次数：27 相关文章所有 12 个版本

[PDF] mlr.press

Taming the monster: A fast and simple algorithm for contextual bandits

A Agarwal, D Hsu, S Kale, J Langford… - International …, 2014 - proceedings.mlr.press

… We present a new algorithm for the contextual bandit learning problem, where the learner
repeatedly takes one of K actions in response to the observed context, and observes the …

被引用次数：597 相关文章所有 19 个版本

[PDF] arxiv.org

No-Regret is not enough! Bandits with General Constraints through Adaptive Regret Minimization

M Bernasconi, M Castiglioni, A Celli - arXiv preprint arXiv:2405.06575, 2024 - arxiv.org

… used to describe the contextual bandits with linear constraints … contextual bandits with
regression oracles. In this setting, the decision maker observes a context zt ∈ Z from some context …

被引用次数：2 相关文章所有 2 个版本

[PDF] mlr.press

Contextual bandits with smooth regret: Efficient learning in continuous action spaces

Y Zhu, P Mineiro - International Conference on Machine …, 2022 - proceedings.mlr.press

… Designing efficient general-purpose contextual bandit algorithms that … regret guarantees can
be hopeless, alternative regret no… We propose a smooth regret notion for contextual bandits, …

被引用次数：16 相关文章所有 3 个版本

[PDF] neurips.cc

Model selection for contextual bandits

DJ Foster, A Krishnamurthy… - Advances in Neural …, 2019 - proceedings.neurips.cc

… selection in contextual bandits, a simple interactive … contextual bandit learning, where a
learner must balance exploration and exploitation to make decisions online? Contextual bandit …

被引用次数：104 相关文章所有 10 个版本

[PDF] aaai.org

Meta-learning for simple regret minimization

J Azizi, B Kveton, M Ghavamzadeh… - Proceedings of the AAAI …, 2023 - ojs.aaai.org

… In this section, we consider linear contextual bandits. Suppose that each arm a ∈ A is a
vector in Rd and |A| = K. Also, assume νs(a; µs) = N(a µs,σ2), ie, with a little abuse of notation µs(…

被引用次数：11 相关文章所有 5 个版本

高级搜索

QQ 群