Averaging -step Returns Reduces Variance in Reinforcement Learning

B Daley, M White, MC Machado - Forty-first International …, 2024 - openreview.net
Multistep returns, such as $ n $-step returns and $\lambda $-returns, are commonly used to
improve the sample efficiency of reinforcement learning (RL) methods. The variance of the …

More benefits of being distributional: Second-order bounds for reinforcement learning

K Wang, O Oertell, A Agarwal, N Kallus… - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we prove that Distributional Reinforcement Learning (DistRL), which learns the
return distribution, can obtain second-order bounds in both online and offline RL in general …

A Long -step Surrogate Stage Reward for Deep Reinforcement Learning

J Zhong, R Wu, J Si - Advances in Neural Information …, 2024 - proceedings.neurips.cc
We introduce a new stage reward estimator named the long $ N $-step surrogate stage
(LNSS) reward for deep reinforcement learning (RL). It aims at mitigating the high variance …

[PDF][PDF] On-policy vs. off-policy updates for deep reinforcement learning

M Hausknecht, P Stone, O Mc - Deep reinforcement learning: frontiers …, 2016 - cs.utexas.edu
Temporal-difference-based deep-reinforcement learning methods have typically been
driven by off-policy, bootstrap Q-Learning updates. In this paper, we investigate the effects of …

Hyperparameters in reinforcement learning and how to tune them

T Eimer, M Lindauer… - … Conference on Machine …, 2023 - proceedings.mlr.press
In order to improve reproducibility, deep reinforcement learning (RL) has been adopting
better scientific practices such as standardized evaluation metrics and reporting. However …

TD or not TD: Analyzing the role of temporal differencing in deep reinforcement learning

A Amiranashvili, A Dosovitskiy, V Koltun… - arXiv preprint arXiv …, 2018 - arxiv.org
Our understanding of reinforcement learning (RL) has been shaped by theoretical and
empirical results that were obtained decades ago using tabular representations and linear …

Probabilistic inference in reinforcement learning done right

J Tarbouriech, T Lattimore… - Advances in Neural …, 2024 - proceedings.neurips.cc
A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic
inference on a graphical model of the Markov decision process (MDP). The core object of …

Learning dynamics and generalization in deep reinforcement learning

C Lyle, M Rowland, W Dabney… - International …, 2022 - proceedings.mlr.press
Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a
potentially discontinuous value function, and generalizing well to new observations. In this …

Striving for simplicity in off-policy deep reinforcement learning

R Agarwal, D Schuurmans, M Norouzi - 2019 - openreview.net
This paper advocates the use of offline (batch) reinforcement learning (RL) to help (1) isolate
the contributions of exploitation vs. exploration in off-policy deep RL,(2) improve …

Diffusion models for reinforcement learning: A survey

Z Zhu, H Zhao, H He, Y Zhong, S Zhang, Y Yu… - arXiv preprint arXiv …, 2023 - arxiv.org
Diffusion models have emerged as a prominent class of generative models, surpassing
previous methods regarding sample quality and training stability. Recent works have shown …