Learning about rewards and punishments is critical for survival. Classical studies have demonstrated an impressive correspondence between the firing of dopamine neurons in the …
Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have …
Abstract We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value …
Many reinforcement learning (RL) tasks provide the agent with high-dimensional observations that can be simplified into low-dimensional continuous states. To formalize this …
Many reinforcement learning (RL) problems in practice are offline, learning purely from observational data. A key challenge is how to ensure the learned policy is safe, which …
JSO Ceron, PS Castro - International Conference on …, 2021 - proceedings.mlr.press
Since the introduction of DQN, a vast majority of reinforcement learning research has focused on reinforcement learning with deep neural networks as function approximators …
J Duan, Y Guan, SE Li, Y Ren, Q Sun… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
In reinforcement learning (RL), function approximation errors are known to easily lead to the- value overestimations, thus greatly reducing policy performance. This article presents a …
Bootstrapping is a core mechanism in Reinforcement Learning (RL). Most algorithms, based on temporal differences, replace the true value of a transiting state by their current estimate …
H Sun, L Han, R Yang, X Ma… - Advances in neural …, 2022 - proceedings.neurips.cc
In this work, we study the simple yet universally applicable case of reward shaping in value- based Deep Reinforcement Learning (DRL). We show that reward shifting in the form of a …