A survey on model-based reinforcement learning

FM Luo, T Xu, H Lai, XH Chen, W Zhang… - Science China Information …, 2024 - Springer
Reinforcement learning (RL) interacts with the environment to solve sequential decision-
making problems via a trial-and-error approach. Errors are always undesirable in real-world …

Bellman eluder dimension: New rich classes of rl problems, and sample-efficient algorithms

C Jin, Q Liu, S Miryoosefi - Advances in neural information …, 2021 - proceedings.neurips.cc
Finding the minimal structural assumptions that empower sample-efficient learning is one of
the most important research directions in Reinforcement Learning (RL). This paper …

Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press
Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …

Nearly minimax optimal reinforcement learning for linear mixture markov decision processes

D Zhou, Q Gu, C Szepesvari - Conference on Learning …, 2021 - proceedings.mlr.press
We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …

Policy finetuning: Bridging sample-efficient offline and online reinforcement learning

T Xie, N Jiang, H Wang, C Xiong… - Advances in neural …, 2021 - proceedings.neurips.cc
Recent theoretical work studies sample-efficient reinforcement learning (RL) extensively in
two settings: learning interactively in the environment (online RL), or learning from an offline …

Jump-start reinforcement learning

I Uchendu, T Xiao, Y Lu, B Zhu, M Yan… - International …, 2023 - proceedings.mlr.press
Reinforcement learning (RL) provides a theoretical framework for continuously improving an
agent's behavior via trial and error. However, efficiently learning policies from scratch can be …

Model-based reinforcement learning with value-targeted regression

A Ayoub, Z Jia, C Szepesvari… - … on Machine Learning, 2020 - proceedings.mlr.press
This paper studies model-based reinforcement learning (RL) for regret minimization. We
focus on finite-horizon episodic RL where the transition model $ P $ belongs to a known …

Nearly minimax optimal reinforcement learning for linear markov decision processes

J He, H Zhao, D Zhou, Q Gu - International Conference on …, 2023 - proceedings.mlr.press
We study reinforcement learning (RL) with linear function approximation. For episodic time-
inhomogeneous linear Markov decision processes (linear MDPs) whose transition …

The curious price of distributional robustness in reinforcement learning with a generative model

L Shi, G Li, Y Wei, Y Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc
This paper investigates model robustness in reinforcement learning (RL) via the framework
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …

Settling the sample complexity of model-based offline reinforcement learning

G Li, L Shi, Y Chen, Y Chi, Y Wei - The Annals of Statistics, 2024 - projecteuclid.org
Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …