In this paper, we prove that Distributional Reinforcement Learning (DistRL), which learns the return distribution, can obtain second-order bounds in both online and offline RL in general …
J Zhong, R Wu, J Si - Advances in Neural Information …, 2024 - proceedings.neurips.cc
We introduce a new stage reward estimator named the long $ N $-step surrogate stage (LNSS) reward for deep reinforcement learning (RL). It aims at mitigating the high variance …
M Hausknecht, P Stone, O Mc - Deep reinforcement learning: frontiers …, 2016 - cs.utexas.edu
Temporal-difference-based deep-reinforcement learning methods have typically been driven by off-policy, bootstrap Q-Learning updates. In this paper, we investigate the effects of …
T Eimer, M Lindauer… - … Conference on Machine …, 2023 - proceedings.mlr.press
In order to improve reproducibility, deep reinforcement learning (RL) has been adopting better scientific practices such as standardized evaluation metrics and reporting. However …
Our understanding of reinforcement learning (RL) has been shaped by theoretical and empirical results that were obtained decades ago using tabular representations and linear …
A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic inference on a graphical model of the Markov decision process (MDP). The core object of …
Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially discontinuous value function, and generalizing well to new observations. In this …
This paper advocates the use of offline (batch) reinforcement learning (RL) to help (1) isolate the contributions of exploitation vs. exploration in off-policy deep RL,(2) improve …
Z Zhu, H Zhao, H He, Y Zhong, S Zhang, Y Yu… - arXiv preprint arXiv …, 2023 - arxiv.org
Diffusion models have emerged as a prominent class of generative models, surpassing previous methods regarding sample quality and training stability. Recent works have shown …