Regret lower bound and optimal algorithm in finite stochastic partial monitoring

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：3212 相关文章所有 9 个版本

[PDF] mlr.press

The end of optimism? an asymptotic analysis of finite-armed linear bandits

T Lattimore, C Szepesvari - Artificial Intelligence and …, 2017 - proceedings.mlr.press

Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with
numerous practical applications. Current approaches focus on generalising existing …

被引用次数：144 相关文章所有 8 个版本

[PDF] mlr.press

Achieving near instance-optimality and minimax-optimality in stochastic and adversarial linear bandits simultaneously

CW Lee, H Luo, CY Wei, M Zhang… - … on Machine Learning, 2021 - proceedings.mlr.press

In this work, we develop linear bandit algorithms that automatically adapt to different
environments. By plugging a novel loss estimator into the optimization problem that …

被引用次数：53 相关文章所有 5 个版本

[PDF] neurips.cc

Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds

T Tsuchiya, S Ito, J Honda - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Adaptivity to the difficulties of a problem is a key property in sequential decision-making
problems to broaden the applicability of algorithms. Follow-the-regularized-leader (FTRL) …

被引用次数：9 相关文章所有 7 个版本

[PDF] mlr.press

Instance-optimality in interactive decision making: Toward a non-asymptotic theory

AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …

被引用次数：19 相关文章所有 3 个版本

[PDF] mlr.press

Best-of-both-worlds algorithms for partial monitoring

T Tsuchiya, S Ito, J Honda - International Conference on …, 2023 - proceedings.mlr.press

This study considers the partial monitoring problem with $ k $-actions and $ d $-outcomes
and provides the first best-of-both-worlds algorithms, whose regrets are favorably bounded …

被引用次数：21 相关文章所有 4 个版本

[PDF] mlr.press

Information directed sampling for linear partial monitoring

J Kirschner, T Lattimore… - Conference on Learning …, 2020 - proceedings.mlr.press

Partial monitoring is a rich framework for sequential decision making under uncertainty that
generalizes many well known bandit models, including linear, combinatorial and dueling …

被引用次数：55 相关文章所有 5 个版本

[PDF] mlr.press

An information-theoretic approach to minimax regret in partial monitoring

T Lattimore, C Szepesvári - Conference on Learning Theory, 2019 - proceedings.mlr.press

We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax
regret under finite-action partial monitoring with no assumptions on the space of signals or …

被引用次数：70 相关文章所有 10 个版本

[PDF] jmlr.org

Linear partial monitoring for sequential decision making: Algorithms, regret bounds and applications

J Kirschner, T Lattimore, A Krause - The Journal of Machine Learning …, 2023 - dl.acm.org

Partial monitoring is an expressive framework for sequential decision-making with an
abundance of applications, including graph-structured and dueling bandits, dynamic pricing …

被引用次数：5 相关文章所有 4 个版本

[PDF] mlr.press

Learning Fair Division from Bandit Feedback

H Yamada, J Komiyama, K Abe… - … Conference on Artificial …, 2024 - proceedings.mlr.press

This work addresses learning online fair division under uncertainty, where a central planner
sequentially allocates items without precise knowledge of agents' values or utilities …

被引用次数：4 相关文章所有 3 个版本

高级搜索

QQ 群