A comparative analysis of expected and distributional reinforcement learning

M Botvinick, JX Wang, W Dabney, KJ Miller… - Neuron, 2020 - cell.com

The emergence of powerful artificial intelligence (AI) is defining new research directions in
neuroscience. To date, this research has focused largely on deep neural networks trained …

被引用次数：202 相关文章所有 7 个版本

[PDF] cell.com

Distributional reinforcement learning in the brain

AS Lowet, Q Zheng, S Matias, J Drugowitsch… - Trends in …, 2020 - cell.com

Learning about rewards and punishments is critical for survival. Classical studies have
demonstrated an impressive correspondence between the firing of dopamine neurons in the …

被引用次数：61 相关文章所有 12 个版本

[PDF] arxiv.org

A survey and critique of multiagent deep reinforcement learning

P Hernandez-Leal, B Kartal, ME Taylor - Autonomous Agents and Multi …, 2019 - Springer

Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has
led to a dramatic increase in the number of applications and methods. Recent works have …

被引用次数：638 相关文章所有 8 个版本

[PDF] mlr.press

Phasic policy gradient

KW Cobbe, J Hilton, O Klimov… - … on Machine Learning, 2021 - proceedings.mlr.press

Abstract We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework
which modifies traditional on-policy actor-critic methods by separating policy and value …

被引用次数：175 相关文章所有 5 个版本

[PDF] mlr.press

Deepmdp: Learning continuous latent space models for representation learning

C Gelada, S Kumar, J Buckman… - International …, 2019 - proceedings.mlr.press

Many reinforcement learning (RL) tasks provide the agent with high-dimensional
observations that can be simplified into low-dimensional continuous states. To formalize this …

被引用次数：322 相关文章所有 4 个版本

[PDF] neurips.cc

Conservative offline distributional reinforcement learning

Y Ma, D Jayaraman, O Bastani - Advances in neural …, 2021 - proceedings.neurips.cc

Many reinforcement learning (RL) problems in practice are offline, learning purely from
observational data. A key challenge is how to ensure the learned policy is safe, which …

被引用次数：84 相关文章所有 7 个版本

[PDF] mlr.press

Revisiting rainbow: Promoting more insightful and inclusive deep reinforcement learning research

JSO Ceron, PS Castro - International Conference on …, 2021 - proceedings.mlr.press

Since the introduction of DQN, a vast majority of reinforcement learning research has
focused on reinforcement learning with deep neural networks as function approximators …

被引用次数：113 相关文章所有 2 个版本

[PDF] arxiv.org

Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors

J Duan, Y Guan, SE Li, Y Ren, Q Sun… - IEEE transactions on …, 2021 - ieeexplore.ieee.org

In reinforcement learning (RL), function approximation errors are known to easily lead to the-
value overestimations, thus greatly reducing policy performance. This article presents a …

被引用次数：181 相关文章所有 7 个版本

[PDF] neurips.cc

Munchausen reinforcement learning

N Vieillard, O Pietquin, M Geist - Advances in Neural …, 2020 - proceedings.neurips.cc

Bootstrapping is a core mechanism in Reinforcement Learning (RL). Most algorithms, based
on temporal differences, replace the true value of a transiting state by their current estimate …

被引用次数：94 相关文章所有 8 个版本

[PDF] neurips.cc

Exploit reward shifting in value-based deep-rl: Optimistic curiosity-based exploration and conservative exploitation via linear reward shaping

H Sun, L Han, R Yang, X Ma… - Advances in neural …, 2022 - proceedings.neurips.cc

In this work, we study the simple yet universally applicable case of reward shaping in value-
based Deep Reinforcement Learning (DRL). We show that reward shifting in the form of a …

被引用次数：19 相关文章所有 3 个版本

高级搜索

QQ 群