Optimizing for the future in non-stationary mdps

Y Chandak, G Theocharous… - International …, 2020 - proceedings.mlr.press
Most reinforcement learning methods are based upon the key assumption that the transition
dynamics and reward functions are fixed, that is, the underlying Markov decision process is …

Nonstationary reinforcement learning with linear function approximation

H Zhou, J Chen, LR Varshney, A Jagmohan - arXiv preprint arXiv …, 2020 - arxiv.org
We consider reinforcement learning (RL) in episodic Markov decision processes (MDPs)
with linear function approximation under drifting environment. Specifically, both the reward …

Minimax regret for cascading bandits

D Vial, S Sanghavi, S Shakkottai… - Advances in Neural …, 2022 - proceedings.neurips.cc
Cascading bandits is a natural and popular model that frames the task of learning to rank
from Bernoulli click feedback in a bandit setting. For the case of unstructured rewards, we …

Off-policy evaluation for action-dependent non-stationary environments

Y Chandak, S Shankar, N Bastian… - Advances in …, 2022 - proceedings.neurips.cc
Methods for sequential decision-making are often built upon a foundational assumption that
the underlying decision process is stationary. This limits the application of such methods …

Combinatorial semi-bandit in the non-stationary environment

W Chen, L Wang, H Zhao… - Uncertainty in Artificial …, 2021 - proceedings.mlr.press
In this paper, we investigate the non-stationary combinatorial semi-bandit problem, both in
the switching case and in the dynamic case. In the general case where (a) the reward …

High probability latency quickest change detection over a finite horizon

YH Huang, VV Veeravalli - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
THIS PAPER IS ELIGIBLE FOR THE STUDENT PAPER AWARD. A finite horizon variant of
the quickest change detection problem is studied, in which the goal is to minimize a delay …

Adversarial linear contextual bandits with graph-structured side observations

L Wang, B Li, H Zhou, GB Giannakis… - Proceedings of the …, 2021 - ojs.aaai.org
This paper studies the adversarial graphical contextual bandits, a variant of adversarial multi-
armed bandits that leverage two categories of the most common side information: contexts …

Distributed consensus algorithm for decision-making in multi-agent multi-armed bandit

X Cheng, S Maghsudi - IEEE Transactions on Control of …, 2024 - ieeexplore.ieee.org
We study a structured multi-agent multi-armed bandit (MAMAB) problem in a dynamic
environment. A graph reflects the information-sharing structure among agents, and the arms' …

Sequential Change Detection for Learning in Piecewise Stationary Bandit Environments

YH Huang, VV Veeravalli - arXiv preprint arXiv:2501.10974, 2025 - arxiv.org
A finite-horizon variant of the quickest change detection problem is investigated, which is
motivated by a change detection problem that arises in piecewise stationary bandits. The …

High Probability Latency Sequential Change Detection over an Unknown Finite Horizon

YH Huang, VV Veeravalli - arXiv preprint arXiv:2408.05817, 2024 - arxiv.org
A finite horizon variant of the quickest change detection problem is studied, in which the goal
is to minimize a delay threshold (latency), under constraints on the probability of false alarm …