Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong assumptions on both the function classes (eg, Bellman-completeness) and the data …
Actor-critic methods are widely used in offline reinforcement learningpractice, but are not so well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …
Two central paradigms have emerged in the reinforcement learning (RL) community: online RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …
We consider a hybrid reinforcement learning setting (Hybrid RL), in which an agent has access to an offline dataset and the ability to collect experience via real-world online …
Coverage conditions--which assert that the data logging distribution adequately covers the state space--play a fundamental role in determining the sample complexity of offline …
M Uehara, J Huang, N Jiang - International Conference on …, 2020 - proceedings.mlr.press
We provide theoretical investigations into off-policy evaluation in reinforcement learning using function approximators for (marginalized) importance weights and value functions. Our …
T Xie, N Jiang - International Conference on Machine …, 2021 - proceedings.mlr.press
We make progress in a long-standing problem of batch reinforcement learning (RL): learning Q* from an exploratory and polynomial-sized dataset, using a realizable and …
We study worst-case guarantees on the expected return of fixed-dataset policy optimization algorithms. Our core contribution is a unified conceptual and mathematical framework for the …