We consider the batch (off-line) policy learning problem in the infinite horizon Markov Decision Process. Motivated by mobile health applications, we focus on learning a policy …
G Xiong, J Li - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc
Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB) problem. Although it is provably asymptotically optimal, finding Whittle indices remains …
We consider experiments in dynamical systems where interventions on some experimental units impact other units through a limiting constraint (such as a limited supply of products) …
S Zhang, H Yao, S Whiteson - International Conference on …, 2021 - proceedings.mlr.press
The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously. In …
Y Zhang, KW Ross - International Conference on Machine …, 2021 - proceedings.mlr.press
We develop theory and algorithms for average-reward on-policy Reinforcement Learning (RL). We first consider bounding the difference of the long-term average reward for two …
S Zhang, Y Wan, RS Sutton… - … conference on machine …, 2021 - proceedings.mlr.press
We consider off-policy policy evaluation with function approximation (FA) in average-reward MDPs, where the goal is to estimate both the reward rate and the differential value function …
The focus of this paper is on sample complexity guarantees of average-reward reinforcement learning algorithms, which are known to be more challenging to study than …
The main challenge of multiagent reinforcement learning is the difficulty of learning useful policies in the presence of other simultaneously learning agents whose changing behaviors …
Z Liang, X Ma, J Blanchet, J Zhang, Z Zhou - arXiv preprint arXiv …, 2023 - arxiv.org
As a framework for sequential decision-making, Reinforcement Learning (RL) has been regarded as an essential component leading to Artificial General Intelligence (AGI) …