Simple regret minimization for contextual bandits

AA Deshmukh, S Sharma, JW Cutler… - arXiv preprint arXiv …, 2018 - arxiv.org
… bounding Simple regret for contextual bandits is … simple regret minimization in the contextual
bandit setting. We propose the Contextual-Gap algorithm, give a regret bound for the simple

[PDF][PDF] Simple regret minimization for contextual bandits using bayesian optimal experimental design

M Jörke, J Lee, E Brunskill - … Design and Active Learning in the …, 2022 - realworldml.github.io
… We consider a stochastic contextual bandit model where each context s ∈ S is independently
sampled from a distribution ρ. We assume that ρ is known or that it can be approximated …

Proportional response: Contextual bandits for simple and cumulative regret minimization

SK Krishnamurthy, R Zhan, S Athey… - Advances in Neural …, 2023 - proceedings.neurips.cc
… efficient bandit algorithms for the stochastic contextual bandit setting, … regret minimization
(where we establish near-optimal minimax guarantees) versus simple regret minimization (…

Contexts can be cheap: Solving stochastic contextual bandits with linear bandit algorithms

OA Hanna, L Yang, C Fragouli - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
… space (context) At. That is, we can think of linear bandits as single-context contextual bandits,
… For example, while linear bandits are used in recommendation systems where the set of …

Instance-optimal pac algorithms for contextual bandits

Z Li, L Ratliff, KG Jamieson… - Advances in Neural …, 2022 - proceedings.neurips.cc
… Minimax regret bounds for general policy classes The vast majority of research in contextual
bandits focuses on regret minimization. That is, for a time horizon T, the goal of the player is …

Taming the monster: A fast and simple algorithm for contextual bandits

A Agarwal, D Hsu, S Kale, J Langford… - International …, 2014 - proceedings.mlr.press
… We present a new algorithm for the contextual bandit learning problem, where the learner
repeatedly takes one of K actions in response to the observed context, and observes the …

No-Regret is not enough! Bandits with General Constraints through Adaptive Regret Minimization

M Bernasconi, M Castiglioni, A Celli - arXiv preprint arXiv:2405.06575, 2024 - arxiv.org
… used to describe the contextual bandits with linear constraints … contextual bandits with
regression oracles. In this setting, the decision maker observes a context zt ∈ Z from some context

Contextual bandits with smooth regret: Efficient learning in continuous action spaces

Y Zhu, P Mineiro - International Conference on Machine …, 2022 - proceedings.mlr.press
… Designing efficient general-purpose contextual bandit algorithms that … regret guarantees can
be hopeless, alternative regret no… We propose a smooth regret notion for contextual bandits, …

Model selection for contextual bandits

DJ Foster, A Krishnamurthy… - Advances in Neural …, 2019 - proceedings.neurips.cc
… selection in contextual bandits, a simple interactive … contextual bandit learning, where a
learner must balance exploration and exploitation to make decisions online? Contextual bandit

Meta-learning for simple regret minimization

J Azizi, B Kveton, M Ghavamzadeh… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
… In this section, we consider linear contextual bandits. Suppose that each arm a ∈ A is a
vector in Rd and |A| = K. Also, assume νs(a; µs) = N(a µs,σ2), ie, with a little abuse of notation µs(…