Thompson sampling for stochastic bandits with graph feedback

B Hao, T Lattimore, C Qin - International Conference on …, 2022 - proceedings.mlr.press

Abstract Information-directed sampling (IDS) has recently demonstrated its potential as a
data-efficient reinforcement learning algorithm. However, it is still unclear what is the right …

被引用次数：17 相关文章所有 4 个版本

[PDF] mlr.press

Stochastic graphical bandits with heavy-tailed rewards

Y Gou, J Yi, L Zhang - Uncertainty in Artificial Intelligence, 2023 - proceedings.mlr.press

We consider stochastic graphical bandits, where after pulling an arm, the decision maker
observes rewards of not only the chosen arm but also its neighbors in a feedback graph …

被引用次数：2 相关文章所有 5 个版本

[PDF] mlr.press

Simultaneously learning stochastic and adversarial bandits with general graph feedback

F Kong, Y Zhou, S Li - International Conference on Machine …, 2022 - proceedings.mlr.press

The problem of online learning with graph feedback has been extensively studied in the
literature due to its generality and potential to model various learning tasks. Existing works …

被引用次数：10 相关文章所有 3 个版本

[PDF] mlr.press

Leveraging demonstrations to improve online learning: Quality matters

B Hao, R Jain, T Lattimore… - … on Machine Learning, 2023 - proceedings.mlr.press

We investigate the extent to which offline demonstration data can improve online learning. It
is natural to expect some improvement, but the question is how, and by how much? We …

被引用次数：6 相关文章所有 6 个版本

[PDF] aaai.org

Information directed sampling for stochastic bandits with graph feedback

F Liu, S Buccapatnam, N Shroff - … of the AAAI Conference on Artificial …, 2018 - ojs.aaai.org

We consider stochastic multi-armed bandit problems with graph feedback, where the
decision maker is allowed to observe the neighboring actions of the chosen action. We allow …

被引用次数：47 相关文章所有 12 个版本

[PDF] mlr.press

Small-loss bounds for online learning with partial information

T Lykouris, K Sridharan… - Conference on Learning …, 2018 - proceedings.mlr.press

We consider the problem of adversarial (non-stochastic) online learning with partial
information feedback, where at each round, a decision maker selects an action from a finite …

被引用次数：44 相关文章所有 8 个版本

[PDF] arxiv.org

Satisficing in time-sensitive bandit learning

D Russo, B Van Roy - arXiv preprint arXiv:1803.02855, 2018 - arxiv.org

Much of the recent literature on bandit learning focuses on algorithms that aim to converge
on an optimal action. One shortcoming is that this orientation does not account for time …

被引用次数：39 相关文章所有 2 个版本

[PDF] mlr.press

Feedback graph regret bounds for Thompson sampling and UCB

T Lykouris, E Tardos, D Wali - Algorithmic Learning Theory, 2020 - proceedings.mlr.press

We study the stochastic multi-armed bandit problem with the graph-based feedback
structure introduced by Mannor and Shamir. We analyze the performance of the two most …

被引用次数：29 相关文章所有 4 个版本

[PDF] neurips.cc

Bandits with feedback graphs and switching costs

R Arora, TV Marinov, M Mohri - Advances in Neural …, 2019 - proceedings.neurips.cc

We study the adversarial multi-armed bandit problem where the learner is supplied with
partial observations modeled by a\emph {feedback graph} and where shifting to a new …

被引用次数：32 相关文章所有 8 个版本

[PDF] neurips.cc

Understanding bandits with graph feedback

H Chen, S Li, C Zhang - Advances in Neural Information …, 2021 - proceedings.neurips.cc

The bandit problem with graph feedback, proposed in [Mannor and Shamir, NeurIPS 2011],
is modeled by a directed graph $ G=(V, E) $ where $ V $ is the collection of bandit arms, and …

被引用次数：13 相关文章所有 6 个版本

高级搜索

QQ 群