When to use parametric models in reinforcement learning?

HP Van Hasselt, M Hessel… - Advances in Neural …, 2019 - proceedings.neurips.cc
We examine the question of when and how parametric models are most useful in
reinforcement learning. In particular, we look at commonalities and differences between …

Average-reward off-policy policy evaluation with function approximation

S Zhang, Y Wan, RS Sutton… - … conference on machine …, 2021 - proceedings.mlr.press
We consider off-policy policy evaluation with function approximation (FA) in average-reward
MDPs, where the goal is to estimate both the reward rate and the differential value function …

[HTML][HTML] Reward-respecting subtasks for model-based reinforcement learning

RS Sutton, MC Machado, GZ Holland, D Szepesvari… - Artificial Intelligence, 2023 - Elsevier
To achieve the ambitious goals of artificial intelligence, reinforcement learning must include
planning with a model of the world that is abstract in state and time. Deep learning has made …

Forethought and hindsight in credit assignment

V Chelu, D Precup… - Advances in Neural …, 2020 - proceedings.neurips.cc
We address the problem of credit assignment in reinforcement learning and explore
fundamental questions regarding the way in which an agent can best use additional …

[HTML][HTML] Investigating the properties of neural network representations in reinforcement learning

H Wang, E Miahi, M White, MC Machado, Z Abbas… - Artificial Intelligence, 2024 - Elsevier
In this paper we investigate the properties of representations learned by deep reinforcement
learning systems. Much of the early work on representations for reinforcement learning …

Novelty search in representational space for sample efficient exploration

RY Tao, V François-Lavet… - Advances in Neural …, 2020 - proceedings.neurips.cc
We present a new approach for efficient exploration which leverages a low-dimensional
encoding of the environment learned with a combination of model-based and model-free …

Towards evaluating adaptivity of model-based reinforcement learning methods

Y Wan, A Rahimi-Kalahroudi… - International …, 2022 - proceedings.mlr.press
In recent years, a growing number of deep model-based reinforcement learning (RL)
methods have been introduced. The interest in deep model-based RL is not surprising …

Planning, execution, and adaptation for multi-robot systems using probabilistic and temporal planning

Y Carreno, JHA Ng, Y Petillot… - … Agents and Multiagent …, 2022 - researchportal.hw.ac.uk
Planning for multi-robot coordination during long horizon missions in complex environments
need to consider resources, temporal constraints, and uncertainty. This could be …

Off-policy maximum entropy reinforcement learning: Soft actor-critic with advantage weighted mixture policy (SAC-AWMP)

Z Hou, K Zhang, Y Wan, D Li, C Fu, H Yu - arXiv preprint arXiv:2002.02829, 2020 - arxiv.org
The optimal policy of a reinforcement learning problem is often discontinuous and non-
smooth. Ie, for two states with similar representations, their optimal policies can be …

Bounding-box inference for error-aware model-based reinforcement learning

EJ Talvitie, Z Shao, H Li, J Hu, J Boerma… - arXiv preprint arXiv …, 2024 - arxiv.org
In model-based reinforcement learning, simulated experiences from the learned model are
often treated as equivalent to experience from the real environment. However, when the …