M Jörke, J Lee, E Brunskill - … Design and Active Learning in the …, 2022 - realworldml.github.io
… We consider a stochastic contextualbandit model where each context s ∈ S is independently sampled from a distribution ρ. We assume that ρ is known or that it can be approximated …
… efficient bandit algorithms for the stochastic contextualbandit setting, … regretminimization (where we establish near-optimal minimax guarantees) versus simpleregretminimization (…
… space (context) At. That is, we can think of linear bandits as single-contextcontextualbandits, … For example, while linear bandits are used in recommendation systems where the set of …
… Minimax regret bounds for general policy classes The vast majority of research in contextual bandits focuses on regretminimization. That is, for a time horizon T, the goal of the player is …
… We present a new algorithm for the contextualbandit learning problem, where the learner repeatedly takes one of K actions in response to the observed context, and observes the …
… used to describe the contextualbandits with linear constraints … contextualbandits with regression oracles. In this setting, the decision maker observes a context zt ∈ Z from some context …
Y Zhu, P Mineiro - International Conference on Machine …, 2022 - proceedings.mlr.press
… Designing efficient general-purpose contextualbandit algorithms that … regret guarantees can be hopeless, alternative regret no… We propose a smooth regret notion for contextualbandits, …
… selection in contextualbandits, a simple interactive … contextualbandit learning, where a learner must balance exploration and exploitation to make decisions online? Contextualbandit …
… In this section, we consider linear contextualbandits. Suppose that each arm a ∈ A is a vector in Rd and |A| = K. Also, assume νs(a; µs) = N(a µs,σ2), ie, with a little abuse of notation µs(…