Model gradient: unified model and policy learning in model-based reinforcement learning

C Jia, F Zhang, T Xu, JC Pang, Z Zhang… - Frontiers of Computer …, 2024 - Springer
Abstract Model-based reinforcement learning is a promising direction to improve the sample
efficiency of reinforcement learning with learning a model of the environment. Previous …

A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning

R Wei, N Lambert, A McDonald, A Garcia… - arXiv preprint arXiv …, 2023 - arxiv.org
Model-based Reinforcement Learning (MBRL) aims to make agents more sample-efficient,
adaptive, and explainable by learning an explicit model of the environment. While the …

Gradient estimation in model-based reinforcement learning: a study on linear quadratic environments

ÂG Lovatto, TP Bueno, LN de Barros - Brazilian Conference on Intelligent …, 2021 - Springer
Abstract Stochastic Value Gradient (SVG) methods underlie many recent achievements of
model-based Reinforcement Learning agents in continuous state-action spaces. Despite …

Critic-over-Actor-Critic Modeling: Finding Optimal Strategy in ICU Environments

R Ryan, M Shao - 2022 IEEE International Conference on Big …, 2022 - ieeexplore.ieee.org
Reinforcement learning (RL) is mechanized to learn from experience. It solves the problem
in sequential decisions by optimizing reward-punishment through experimentation of the …

Exploration Versus Exploitation in Model-Based Reinforcement Learning: An Empirical Study

ÂG Lovatto, LN de Barros, DD Mauá - Brazilian Conference on Intelligent …, 2022 - Springer
Abstract Model-based Reinforcement Learning (MBRL) agents use data collected by
exploration of the environment to produce a model of the dynamics, which is then used to …

Generalisation Ability of Proper Value Equivalence Models in Model-Based Reinforcement Learning

S Bratus - 2024 - repository.tudelft.nl
We investigate the generalization performance of predictive models in model-based
reinforcement learning when trained using maximum likelihood estimation (MLE) versus …