Bidirectional model-based policy optimization

FM Luo, T Xu, H Lai, XH Chen, W Zhang… - Science China Information …, 2024 - Springer

Reinforcement learning (RL) interacts with the environment to solve sequential decision-
making problems via a trial-and-error approach. Errors are always undesirable in real-world …

被引用次数：109 相关文章所有 4 个版本

[PDF] arxiv.org

Randomized ensembled double q-learning: Learning fast without a model

X Chen, C Wang, Z Zhou, K Ross - arXiv preprint arXiv:2101.05982, 2021 - arxiv.org

Using a high Update-To-Data (UTD) ratio, model-based methods have recently achieved
much higher sample efficiency than previous model-free methods for continuous-action DRL …

被引用次数：290 相关文章所有 7 个版本

[PDF] arxiv.org

Dropout q-functions for doubly efficient reinforcement learning

T Hiraoka, T Imagawa, T Hashimoto, T Onishi… - arXiv preprint arXiv …, 2021 - arxiv.org

Randomized ensembled double Q-learning (REDQ)(Chen et al., 2021b) has recently
achieved state-of-the-art sample efficiency on continuous-action reinforcement learning …

被引用次数：113 相关文章所有 4 个版本

[PDF] neurips.cc

Offline reinforcement learning with reverse model-based imagination

J Wang, W Li, H Jiang, G Zhu, S Li… - Advances in Neural …, 2021 - proceedings.neurips.cc

In offline reinforcement learning (offline RL), one of the main challenges is to deal with the
distributional shift between the learning policy and the given dataset. To address this …

被引用次数：64 相关文章所有 6 个版本

[PDF] neurips.cc

Vrl3: A data-driven framework for visual deep reinforcement learning

C Wang, X Luo, K Ross, D Li - Advances in Neural …, 2022 - proceedings.neurips.cc

We propose VRL3, a powerful data-driven framework with a simple design for solving
challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major …

被引用次数：44 相关文章所有 11 个版本

[PDF] neurips.cc

How to fine-tune the model: Unified model shift and model bias policy optimization

H Zhang, H Yu, J Zhao, D Zhang… - Advances in …, 2024 - proceedings.neurips.cc

Designing and deriving effective model-based reinforcement learning (MBRL) algorithms
with a performance improvement guarantee is challenging, mainly attributed to the high …

被引用次数：4 相关文章所有 5 个版本

[PDF] arxiv.org

A survey of temporal credit assignment in deep reinforcement learning

E Pignatelli, J Ferret, M Geist, T Mesnard… - arXiv preprint arXiv …, 2023 - arxiv.org

The Credit Assignment Problem (CAP) refers to the longstanding challenge of
Reinforcement Learning (RL) agents to associate actions with their long-term …

被引用次数：10 相关文章所有 3 个版本

Dynamic-horizon model-based value estimation with latent imagination

J Wang, Q Zhang, D Zhao - IEEE Transactions on Neural …, 2022 - ieeexplore.ieee.org

Existing model-based value expansion (MVE) methods typically leverage a world model for
value estimation with a fixed rollout horizon to assist policy learning. However, a proper …

被引用次数：14 相关文章所有 3 个版本

[PDF] mlr.press

Live in the moment: Learning dynamics model adapted to evolving policy

X Wang, W Wongkamjan, R Jia… - … on Machine Learning, 2023 - proceedings.mlr.press

Abstract Model-based reinforcement learning (RL) often achieves higher sample efficiency
in practice than model-free RL by learning a dynamics model to generate samples for policy …

被引用次数：17 相关文章所有 8 个版本

[PDF] arxiv.org

Q-ensemble for offline rl: Don't scale the ensemble, scale the batch size

A Nikulin, V Kurenkov, D Tarasov, D Akimov… - arXiv preprint arXiv …, 2022 - arxiv.org

Training large neural networks is known to be time-consuming, with the learning duration
taking days or even weeks. To address this problem, large-batch optimization was …

被引用次数：19 相关文章所有 4 个版本

高级搜索

QQ 群