- 学术资源搜索

Optimizing for the future in non-stationary mdps

Y Chandak, G Theocharous… - International …, 2020 - proceedings.mlr.press

Most reinforcement learning methods are based upon the key assumption that the transition
dynamics and reward functions are fixed, that is, the underlying Markov decision process is …

被引用次数：79 相关文章所有 11 个版本

[PDF] arxiv.org

Nonstationary reinforcement learning with linear function approximation

H Zhou, J Chen, LR Varshney, A Jagmohan - arXiv preprint arXiv …, 2020 - arxiv.org

We consider reinforcement learning (RL) in episodic Markov decision processes (MDPs)
with linear function approximation under drifting environment. Specifically, both the reward …

被引用次数：44 相关文章所有 5 个版本

[PDF] neurips.cc

Minimax regret for cascading bandits

D Vial, S Sanghavi, S Shakkottai… - Advances in Neural …, 2022 - proceedings.neurips.cc

Cascading bandits is a natural and popular model that frames the task of learning to rank
from Bernoulli click feedback in a bandit setting. For the case of unstructured rewards, we …

被引用次数：17 相关文章所有 7 个版本

[PDF] neurips.cc

Off-policy evaluation for action-dependent non-stationary environments

Y Chandak, S Shankar, N Bastian… - Advances in …, 2022 - proceedings.neurips.cc

Methods for sequential decision-making are often built upon a foundational assumption that
the underlying decision process is stationary. This limits the application of such methods …

被引用次数：8 相关文章所有 7 个版本

[PDF] mlr.press

Combinatorial semi-bandit in the non-stationary environment

W Chen, L Wang, H Zhao… - Uncertainty in Artificial …, 2021 - proceedings.mlr.press

In this paper, we investigate the non-stationary combinatorial semi-bandit problem, both in
the switching case and in the dynamic case. In the general case where (a) the reward …

被引用次数：24 相关文章所有 7 个版本

High probability latency quickest change detection over a finite horizon

YH Huang, VV Veeravalli - 2024 IEEE International …, 2024 - ieeexplore.ieee.org

THIS PAPER IS ELIGIBLE FOR THE STUDENT PAPER AWARD. A finite horizon variant of
the quickest change detection problem is studied, in which the goal is to minimize a delay …

被引用次数：2 相关文章所有 3 个版本

[PDF] aaai.org

Adversarial linear contextual bandits with graph-structured side observations

L Wang, B Li, H Zhou, GB Giannakis… - Proceedings of the …, 2021 - ojs.aaai.org

This paper studies the adversarial graphical contextual bandits, a variant of adversarial multi-
armed bandits that leverage two categories of the most common side information: contexts …

被引用次数：9 相关文章所有 12 个版本

[PDF] arxiv.org

Distributed consensus algorithm for decision-making in multi-agent multi-armed bandit

X Cheng, S Maghsudi - IEEE Transactions on Control of …, 2024 - ieeexplore.ieee.org

We study a structured multi-agent multi-armed bandit (MAMAB) problem in a dynamic
environment. A graph reflects the information-sharing structure among agents, and the arms' …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Sequential Change Detection for Learning in Piecewise Stationary Bandit Environments

YH Huang, VV Veeravalli - arXiv preprint arXiv:2501.10974, 2025 - arxiv.org

A finite-horizon variant of the quickest change detection problem is investigated, which is
motivated by a change detection problem that arises in piecewise stationary bandits. The …

[PDF] arxiv.org

High Probability Latency Sequential Change Detection over an Unknown Finite Horizon

YH Huang, VV Veeravalli - arXiv preprint arXiv:2408.05817, 2024 - arxiv.org

A finite horizon variant of the quickest change detection problem is studied, in which the goal
is to minimize a delay threshold (latency), under constraints on the probability of false alarm …

高级搜索

QQ 群

Optimizing for the future in non-stationary mdps

Nonstationary reinforcement learning with linear function approximation

Minimax regret for cascading bandits

Off-policy evaluation for action-dependent non-stationary environments

Combinatorial semi-bandit in the non-stationary environment

High probability latency quickest change detection over a finite horizon

Adversarial linear contextual bandits with graph-structured side observations

Distributed consensus algorithm for decision-making in multi-agent multi-armed bandit

Sequential Change Detection for Learning in Piecewise Stationary Bandit Environments

High Probability Latency Sequential Change Detection over an Unknown Finite Horizon

引用