Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with numerous practical applications. Current approaches focus on generalising existing …
In this work, we develop linear bandit algorithms that automatically adapt to different environments. By plugging a novel loss estimator into the optimization problem that …
T Tsuchiya, S Ito, J Honda - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Adaptivity to the difficulties of a problem is a key property in sequential decision-making problems to broaden the applicability of algorithms. Follow-the-regularized-leader (FTRL) …
We consider the development of adaptive, instance-dependent algorithms for interactive decision making (bandits, reinforcement learning, and beyond) that, rather than only …
T Tsuchiya, S Ito, J Honda - International Conference on …, 2023 - proceedings.mlr.press
This study considers the partial monitoring problem with $ k $-actions and $ d $-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are favorably bounded …
Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling …
We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under finite-action partial monitoring with no assumptions on the space of signals or …
Partial monitoring is an expressive framework for sequential decision-making with an abundance of applications, including graph-structured and dueling bandits, dynamic pricing …
H Yamada, J Komiyama, K Abe… - … Conference on Artificial …, 2024 - proceedings.mlr.press
This work addresses learning online fair division under uncertainty, where a central planner sequentially allocates items without precise knowledge of agents' values or utilities …