相关文章- 学术资源搜索

Averaging -step Returns Reduces Variance in Reinforcement Learning

B Daley, M White, MC Machado - Forty-first International …, 2024 - openreview.net

Multistep returns, such as $ n $-step returns and $\lambda $-returns, are commonly used to
improve the sample efficiency of reinforcement learning (RL) methods. The variance of the …

被引用次数：1 相关文章

[PDF] arxiv.org

More benefits of being distributional: Second-order bounds for reinforcement learning

K Wang, O Oertell, A Agarwal, N Kallus… - arXiv preprint arXiv …, 2024 - arxiv.org

In this paper, we prove that Distributional Reinforcement Learning (DistRL), which learns the
return distribution, can obtain second-order bounds in both online and offline RL in general …

被引用次数：5 相关文章所有 3 个版本

[PDF] neurips.cc

A Long -step Surrogate Stage Reward for Deep Reinforcement Learning

J Zhong, R Wu, J Si - Advances in Neural Information …, 2024 - proceedings.neurips.cc

We introduce a new stage reward estimator named the long $ N $-step surrogate stage
(LNSS) reward for deep reinforcement learning (RL). It aims at mitigating the high variance …

被引用次数：2 相关文章所有 3 个版本

[PDF] utexas.edu

[PDF][PDF] On-policy vs. off-policy updates for deep reinforcement learning

M Hausknecht, P Stone, O Mc - Deep reinforcement learning: frontiers …, 2016 - cs.utexas.edu

Temporal-difference-based deep-reinforcement learning methods have typically been
driven by off-policy, bootstrap Q-Learning updates. In this paper, we investigate the effects of …

被引用次数：45 相关文章所有 2 个版本

[PDF] mlr.press

Hyperparameters in reinforcement learning and how to tune them

T Eimer, M Lindauer… - … Conference on Machine …, 2023 - proceedings.mlr.press

In order to improve reproducibility, deep reinforcement learning (RL) has been adopting
better scientific practices such as standardized evaluation metrics and reporting. However …

被引用次数：28 相关文章所有 7 个版本

[PDF] arxiv.org

TD or not TD: Analyzing the role of temporal differencing in deep reinforcement learning

A Amiranashvili, A Dosovitskiy, V Koltun… - arXiv preprint arXiv …, 2018 - arxiv.org

Our understanding of reinforcement learning (RL) has been shaped by theoretical and
empirical results that were obtained decades ago using tabular representations and linear …

被引用次数：22 相关文章所有 11 个版本

[PDF] neurips.cc

Probabilistic inference in reinforcement learning done right

J Tarbouriech, T Lattimore… - Advances in Neural …, 2024 - proceedings.neurips.cc

A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic
inference on a graphical model of the Markov decision process (MDP). The core object of …

Learning dynamics and generalization in deep reinforcement learning

C Lyle, M Rowland, W Dabney… - International …, 2022 - proceedings.mlr.press

Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a
potentially discontinuous value function, and generalizing well to new observations. In this …

被引用次数：26 相关文章所有 2 个版本

[PDF] openreview.net

Striving for simplicity in off-policy deep reinforcement learning

R Agarwal, D Schuurmans, M Norouzi - 2019 - openreview.net

This paper advocates the use of offline (batch) reinforcement learning (RL) to help (1) isolate
the contributions of exploitation vs. exploration in off-policy deep RL,(2) improve …

被引用次数：75 相关文章所有 2 个版本

[PDF] arxiv.org

Diffusion models for reinforcement learning: A survey

Z Zhu, H Zhao, H He, Y Zhong, S Zhang, Y Yu… - arXiv preprint arXiv …, 2023 - arxiv.org

Diffusion models have emerged as a prominent class of generative models, surpassing
previous methods regarding sample quality and training stability. Recent works have shown …

被引用次数：25 相关文章所有 2 个版本

高级搜索

QQ 群

Averaging -step Returns Reduces Variance in Reinforcement Learning

More benefits of being distributional: Second-order bounds for reinforcement learning

A Long -step Surrogate Stage Reward for Deep Reinforcement Learning

[PDF][PDF] On-policy vs. off-policy updates for deep reinforcement learning

Hyperparameters in reinforcement learning and how to tune them

TD or not TD: Analyzing the role of temporal differencing in deep reinforcement learning

Probabilistic inference in reinforcement learning done right

Learning dynamics and generalization in deep reinforcement learning

Striving for simplicity in off-policy deep reinforcement learning

Diffusion models for reinforcement learning: A survey

相关搜索

引用