An analysis of quantile temporal-difference learning

R Wu, M Uehara, W Sun - International Conference on …, 2023 - proceedings.mlr.press

We study the problem of estimating the distribution of the return of a policy using an offline
dataset that is not generated from the policy, ie, distributional offline policy evaluation (OPE) …

被引用次数：11 相关文章所有 6 个版本

[PDF] neurips.cc

The benefits of being distributional: Small-loss bounds for reinforcement learning

K Wang, K Zhou, R Wu, N Kallus… - Advances in Neural …, 2023 - proceedings.neurips.cc

While distributional reinforcement learning (DistRL) has been empirically effective, the
question of when and why it is better than vanilla, non-distributional RL has remained …

被引用次数：10 相关文章所有 5 个版本

[PDF] arxiv.org

More benefits of being distributional: Second-order bounds for reinforcement learning

K Wang, O Oertell, A Agarwal, N Kallus… - arXiv preprint arXiv …, 2024 - arxiv.org

In this paper, we prove that Distributional Reinforcement Learning (DistRL), which learns the
return distribution, can obtain second-order bounds in both online and offline RL in general …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

One-step distributional reinforcement learning

M Achab, R Alami, YAD Djilali, K Fedyanin… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning (RL) allows an agent interacting sequentially with an environment to
maximize its long-term expected return. In the distributional RL (DistrRL) paradigm, the …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation

Y Chen, X Zhang, S Wang, L Huang - arXiv preprint arXiv:2402.18159, 2024 - arxiv.org

In the realm of reinforcement learning (RL), accounting for risk is crucial for making
decisions under uncertainty, particularly in applications where safety and reliability are …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Value-Distributional Model-Based Reinforcement Learning

CE Luis, AG Bottero, J Vinogradska… - arXiv preprint arXiv …, 2023 - arxiv.org

Quantifying uncertainty about a policy's long-term performance is important to solve
sequential decision-making tasks. We study the problem from a model-based Bayesian …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Variance control for distributional reinforcement learning

Q Kuang, Z Zhu, L Zhang, F Zhou - arXiv preprint arXiv:2307.16152, 2023 - arxiv.org

Although distributional reinforcement learning (DRL) has been widely examined in the past
few years, very few studies investigate the validity of the obtained Q-function estimator in the …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

The Kernel Density Integral Transformation

C McCarter - arXiv preprint arXiv:2309.10194, 2023 - arxiv.org

Feature preprocessing continues to play a critical role when applying machine learning and
statistical methods to tabular data. In this paper, we propose the use of the kernel density …

被引用次数：3 相关文章所有 3 个版本

[PDF] neurips.cc

Distributional policy evaluation: a maximum entropy approach to representation learning

R Zamboni, AM Metelli… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract The Maximum Entropy (Max-Ent) framework has been effectively employed in a
variety of Reinforcement Learning (RL) tasks. In this paper, we first propose a novel Max-Ent …

Policy Evaluation in Distributional LQR (Extended Version)

Z Wang, Y Gao, S Wang, MM Zavlanos, A Abate… - arXiv preprint arXiv …, 2023 - arxiv.org

Distributional reinforcement learning (DRL) enhances the understanding of the effects of the
randomness in the environment by letting agents learn the distribution of a random return …

高级搜索

QQ 群