L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press
Offline or batch reinforcement learning seeks to learn a near-optimal policy using history data without active exploration of the environment. To counter the insufficient coverage and …
Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms in contemporary reinforcement learning. This class of methods is often applied in …
L Shi, G Li, Y Wei, Y Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc
This paper investigates model robustness in reinforcement learning (RL) via the framework of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …
Y Wang, S Zou - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc
Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …
G Li, Y Wei, Y Chi, Y Gu… - Advances in neural …, 2020 - proceedings.neurips.cc
We investigate the sample efficiency of reinforcement learning in a $\gamma $-discounted infinite-horizon Markov decision process (MDP) with state space S and action space A …
J Woo, G Joshi, Y Chi - International Conference on …, 2023 - proceedings.mlr.press
In this paper, we consider federated Q-learning, which aims to learn an optimal Q-function by periodically aggregating local Q-estimates trained on local data alone. Focusing on …
Y Yan, G Li, Y Chen, J Fan - IEEE Transactions on Information …, 2023 - ieeexplore.ieee.org
This paper is concerned with the asynchronous form of Q-learning, which applies a stochastic approximation scheme to Markovian data samples. Motivated by the recent …
S Wang, N Si, J Blanchet… - … Conference on Artificial …, 2023 - proceedings.mlr.press
We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes …