[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

The end of optimism? an asymptotic analysis of finite-armed linear bandits

T Lattimore, C Szepesvari - Artificial Intelligence and …, 2017 - proceedings.mlr.press
Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with
numerous practical applications. Current approaches focus on generalising existing …

Achieving near instance-optimality and minimax-optimality in stochastic and adversarial linear bandits simultaneously

CW Lee, H Luo, CY Wei, M Zhang… - … on Machine Learning, 2021 - proceedings.mlr.press
In this work, we develop linear bandit algorithms that automatically adapt to different
environments. By plugging a novel loss estimator into the optimization problem that …

Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds

T Tsuchiya, S Ito, J Honda - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Adaptivity to the difficulties of a problem is a key property in sequential decision-making
problems to broaden the applicability of algorithms. Follow-the-regularized-leader (FTRL) …

Instance-optimality in interactive decision making: Toward a non-asymptotic theory

AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …

Best-of-both-worlds algorithms for partial monitoring

T Tsuchiya, S Ito, J Honda - International Conference on …, 2023 - proceedings.mlr.press
This study considers the partial monitoring problem with $ k $-actions and $ d $-outcomes
and provides the first best-of-both-worlds algorithms, whose regrets are favorably bounded …

Information directed sampling for linear partial monitoring

J Kirschner, T Lattimore… - Conference on Learning …, 2020 - proceedings.mlr.press
Partial monitoring is a rich framework for sequential decision making under uncertainty that
generalizes many well known bandit models, including linear, combinatorial and dueling …

An information-theoretic approach to minimax regret in partial monitoring

T Lattimore, C Szepesvári - Conference on Learning Theory, 2019 - proceedings.mlr.press
We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax
regret under finite-action partial monitoring with no assumptions on the space of signals or …

Linear partial monitoring for sequential decision making: Algorithms, regret bounds and applications

J Kirschner, T Lattimore, A Krause - The Journal of Machine Learning …, 2023 - dl.acm.org
Partial monitoring is an expressive framework for sequential decision-making with an
abundance of applications, including graph-structured and dueling bandits, dynamic pricing …

Learning Fair Division from Bandit Feedback

H Yamada, J Komiyama, K Abe… - … Conference on Artificial …, 2024 - proceedings.mlr.press
This work addresses learning online fair division under uncertainty, where a central planner
sequentially allocates items without precise knowledge of agents' values or utilities …