L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press
Offline or batch reinforcement learning seeks to learn a near-optimal policy using history data without active exploration of the environment. To counter the insufficient coverage and …
We study reinforcement learning (RL) with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear …
J He, H Zhao, D Zhou, Q Gu - International Conference on …, 2023 - proceedings.mlr.press
We study reinforcement learning (RL) with linear function approximation. For episodic time- inhomogeneous linear Markov decision processes (linear MDPs) whose transition …
M Yin, YX Wang - Advances in neural information …, 2021 - proceedings.neurips.cc
We study the\emph {offline reinforcement learning}(offline RL) problem, where the goal is to learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …
Coverage conditions--which assert that the data logging distribution adequately covers the state space--play a fundamental role in determining the sample complexity of offline …
Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration, but must propose a near-optimal policy …
Offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing only previously collected experience, without any online interaction. While it is widely understood …
We study time-inhomogeneous episodic reinforcement learning (RL) under general function approximation and sparse rewards. We design a new algorithm, Variance-weighted …