A survey on model-based reinforcement learning

FM Luo, T Xu, H Lai, XH Chen, W Zhang… - Science China Information …, 2024 - Springer
Reinforcement learning (RL) interacts with the environment to solve sequential decision-
making problems via a trial-and-error approach. Errors are always undesirable in real-world …

Randomized ensembled double q-learning: Learning fast without a model

X Chen, C Wang, Z Zhou, K Ross - arXiv preprint arXiv:2101.05982, 2021 - arxiv.org
Using a high Update-To-Data (UTD) ratio, model-based methods have recently achieved
much higher sample efficiency than previous model-free methods for continuous-action DRL …

Dropout q-functions for doubly efficient reinforcement learning

T Hiraoka, T Imagawa, T Hashimoto, T Onishi… - arXiv preprint arXiv …, 2021 - arxiv.org
Randomized ensembled double Q-learning (REDQ)(Chen et al., 2021b) has recently
achieved state-of-the-art sample efficiency on continuous-action reinforcement learning …

Offline reinforcement learning with reverse model-based imagination

J Wang, W Li, H Jiang, G Zhu, S Li… - Advances in Neural …, 2021 - proceedings.neurips.cc
In offline reinforcement learning (offline RL), one of the main challenges is to deal with the
distributional shift between the learning policy and the given dataset. To address this …

Vrl3: A data-driven framework for visual deep reinforcement learning

C Wang, X Luo, K Ross, D Li - Advances in Neural …, 2022 - proceedings.neurips.cc
We propose VRL3, a powerful data-driven framework with a simple design for solving
challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major …

How to fine-tune the model: Unified model shift and model bias policy optimization

H Zhang, H Yu, J Zhao, D Zhang… - Advances in …, 2024 - proceedings.neurips.cc
Designing and deriving effective model-based reinforcement learning (MBRL) algorithms
with a performance improvement guarantee is challenging, mainly attributed to the high …

A survey of temporal credit assignment in deep reinforcement learning

E Pignatelli, J Ferret, M Geist, T Mesnard… - arXiv preprint arXiv …, 2023 - arxiv.org
The Credit Assignment Problem (CAP) refers to the longstanding challenge of
Reinforcement Learning (RL) agents to associate actions with their long-term …

Dynamic-horizon model-based value estimation with latent imagination

J Wang, Q Zhang, D Zhao - IEEE Transactions on Neural …, 2022 - ieeexplore.ieee.org
Existing model-based value expansion (MVE) methods typically leverage a world model for
value estimation with a fixed rollout horizon to assist policy learning. However, a proper …

Live in the moment: Learning dynamics model adapted to evolving policy

X Wang, W Wongkamjan, R Jia… - … on Machine Learning, 2023 - proceedings.mlr.press
Abstract Model-based reinforcement learning (RL) often achieves higher sample efficiency
in practice than model-free RL by learning a dynamics model to generate samples for policy …

Q-ensemble for offline rl: Don't scale the ensemble, scale the batch size

A Nikulin, V Kurenkov, D Tarasov, D Akimov… - arXiv preprint arXiv …, 2022 - arxiv.org
Training large neural networks is known to be time-consuming, with the learning duration
taking days or even weeks. To address this problem, large-batch optimization was …