G Xiong, J Li - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc
Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB) problem. Although it is provably asymptotically optimal, finding Whittle indices remains …
We consider experiments in dynamical systems where interventions on some experimental units impact other units through a limiting constraint (such as a limited supply of products) …
S Zhang, H Yao, S Whiteson - International Conference on …, 2021 - proceedings.mlr.press
The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously. In …
The focus of this paper is on sample complexity guarantees of average-reward reinforcement learning algorithms, which are known to be more challenging to study than …
Abstract Robust Markov decision processes (MDPs) address the challenge of model uncertainty by optimizing the worst-case performance over an uncertainty set of MDPs. In …
M Yin, YX Wang - Advances in neural information …, 2021 - proceedings.neurips.cc
This work studies the statistical limits of uniform convergence for offline policy evaluation (OPE) problems with model-based methods (for episodic MDP) and provides a unified …
T Li, F Wu, G Lan - Mathematics of Operations Research, 2024 - pubsonline.informs.org
We study average-reward Markov decision processes (AMDPs) and develop novel first- order methods with strong theoretical guarantees for both policy optimization and policy …
The average reward criterion is relatively less studied as most existing works in the Reinforcement Learning literature consider the discounted reward criterion. There are few …
X Chen, X Ma, Y Li, G Yang… - Uncertainty in Artificial …, 2023 - proceedings.mlr.press
Off-policy learning is a key to extend reinforcement learning as it allows to learn a target policy from a different behavior policy that generates the data. However, it is well known as …