Hyperbolic discounting and learning over multiple horizons

A Kwiatkowski, E Alvarado, V Kalogeiton… - Computer Graphics …, 2022 - Wiley Online Library

Reinforcement Learning is an area of Machine Learning focused on how agents can be
trained to make sequential decisions, and achieve a particular goal within an arbitrary …

被引用次数：37 相关文章所有 14 个版本

[HTML] nih.gov

A distributional code for value in dopamine-based reinforcement learning

W Dabney, Z Kurth-Nelson, N Uchida, CK Starkweather… - Nature, 2020 - nature.com

Since its introduction, the reward prediction error theory of dopamine has explained a wealth
of empirical phenomena, providing a unifying framework for understanding the …

被引用次数：417 相关文章所有 13 个版本

[PDF] arxiv.org

First return, then explore

A Ecoffet, J Huizinga, J Lehman, KO Stanley, J Clune - Nature, 2021 - nature.com

Reinforcement learning promises to solve complex sequential-decision problems
autonomously by specifying a high-level reward function only. However, reinforcement …

被引用次数：360 相关文章所有 10 个版本

[PDF] nature.com

Dopamine transients follow a striatal gradient of reward time horizons

A Mohebi, W Wei, L Pelattini, K Kim, JD Berke - Nature Neuroscience, 2024 - nature.com

Animals make predictions to guide their behavior and update those predictions through
experience. Transient increases in dopamine (DA) are thought to be critical signals for …

被引用次数：9 相关文章所有 7 个版本

[PDF] neurips.cc

On the expressivity of markov reward

D Abel, W Dabney, A Harutyunyan… - Advances in …, 2021 - proceedings.neurips.cc

Reward is the driving force for reinforcement-learning agents. This paper is dedicated to
understanding the expressivity of reward as a way to capture tasks that we would want an …

被引用次数：86 相关文章所有 12 个版本

[PDF] arxiv.org

Recurrent model-free rl can be a strong baseline for many pomdps

T Ni, B Eysenbach, R Salakhutdinov - arXiv preprint arXiv:2110.05038, 2021 - arxiv.org

Many problems in RL, such as meta-RL, robust RL, generalization in RL, and temporal credit
assignment, can be cast as POMDPs. In theory, simply augmenting model-free RL with …

被引用次数：81 相关文章所有 4 个版本

[PDF] mlr.press

Settling the reward hypothesis

M Bowling, JD Martin, D Abel… - … on Machine Learning, 2023 - proceedings.mlr.press

The reward hypothesis posits that," all of what we mean by goals and purposes can be well
thought of as maximization of the expected value of the cumulative sum of a received scalar …

被引用次数：22 相关文章所有 8 个版本

[PDF] mlr.press

On the effect of auxiliary tasks on representation dynamics

C Lyle, M Rowland, G Ostrovski… - International …, 2021 - proceedings.mlr.press

While auxiliary tasks play a key role in shaping the representations learnt by reinforcement
learning agents, much is still unknown about the mechanisms through which this is …

被引用次数：69 相关文章所有 4 个版本

[PDF] enseeiht.fr

[图书][B] Distributional reinforcement learning

MG Bellemare, W Dabney, M Rowland - 2023 - books.google.com

The first comprehensive guide to distributional reinforcement learning, providing a new
mathematical formalism for thinking about decisions from a probabilistic perspective …

被引用次数：118 相关文章所有 9 个版本

[PDF] neurips.cc

A self-tuning actor-critic algorithm

T Zahavy, Z Xu, V Veeriah, M Hessel… - Advances in neural …, 2020 - proceedings.neurips.cc

Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters,
typically requiring significant manual effort to identify hyperparameters that perform well on a …

被引用次数：83 相关文章所有 6 个版本

高级搜索

QQ 群