Contextual information-directed sampling

B Hao, T Lattimore, C Qin - International Conference on …, 2022 - proceedings.mlr.press
Abstract Information-directed sampling (IDS) has recently demonstrated its potential as a
data-efficient reinforcement learning algorithm. However, it is still unclear what is the right …

Stochastic graphical bandits with heavy-tailed rewards

Y Gou, J Yi, L Zhang - Uncertainty in Artificial Intelligence, 2023 - proceedings.mlr.press
We consider stochastic graphical bandits, where after pulling an arm, the decision maker
observes rewards of not only the chosen arm but also its neighbors in a feedback graph …

Simultaneously learning stochastic and adversarial bandits with general graph feedback

F Kong, Y Zhou, S Li - International Conference on Machine …, 2022 - proceedings.mlr.press
The problem of online learning with graph feedback has been extensively studied in the
literature due to its generality and potential to model various learning tasks. Existing works …

Leveraging demonstrations to improve online learning: Quality matters

B Hao, R Jain, T Lattimore… - … on Machine Learning, 2023 - proceedings.mlr.press
We investigate the extent to which offline demonstration data can improve online learning. It
is natural to expect some improvement, but the question is how, and by how much? We …

Information directed sampling for stochastic bandits with graph feedback

F Liu, S Buccapatnam, N Shroff - … of the AAAI Conference on Artificial …, 2018 - ojs.aaai.org
We consider stochastic multi-armed bandit problems with graph feedback, where the
decision maker is allowed to observe the neighboring actions of the chosen action. We allow …

Small-loss bounds for online learning with partial information

T Lykouris, K Sridharan… - Conference on Learning …, 2018 - proceedings.mlr.press
We consider the problem of adversarial (non-stochastic) online learning with partial
information feedback, where at each round, a decision maker selects an action from a finite …

Satisficing in time-sensitive bandit learning

D Russo, B Van Roy - arXiv preprint arXiv:1803.02855, 2018 - arxiv.org
Much of the recent literature on bandit learning focuses on algorithms that aim to converge
on an optimal action. One shortcoming is that this orientation does not account for time …

Feedback graph regret bounds for Thompson sampling and UCB

T Lykouris, E Tardos, D Wali - Algorithmic Learning Theory, 2020 - proceedings.mlr.press
We study the stochastic multi-armed bandit problem with the graph-based feedback
structure introduced by Mannor and Shamir. We analyze the performance of the two most …

Bandits with feedback graphs and switching costs

R Arora, TV Marinov, M Mohri - Advances in Neural …, 2019 - proceedings.neurips.cc
We study the adversarial multi-armed bandit problem where the learner is supplied with
partial observations modeled by a\emph {feedback graph} and where shifting to a new …

Understanding bandits with graph feedback

H Chen, S Li, C Zhang - Advances in Neural Information …, 2021 - proceedings.neurips.cc
The bandit problem with graph feedback, proposed in [Mannor and Shamir, NeurIPS 2011],
is modeled by a directed graph $ G=(V, E) $ where $ V $ is the collection of bandit arms, and …