Distributional offline policy evaluation with predictive error guarantees

R Wu, M Uehara, W Sun - International Conference on …, 2023 - proceedings.mlr.press
We study the problem of estimating the distribution of the return of a policy using an offline
dataset that is not generated from the policy, ie, distributional offline policy evaluation (OPE) …

The benefits of being distributional: Small-loss bounds for reinforcement learning

K Wang, K Zhou, R Wu, N Kallus… - Advances in Neural …, 2023 - proceedings.neurips.cc
While distributional reinforcement learning (DistRL) has been empirically effective, the
question of when and why it is better than vanilla, non-distributional RL has remained …

More benefits of being distributional: Second-order bounds for reinforcement learning

K Wang, O Oertell, A Agarwal, N Kallus… - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we prove that Distributional Reinforcement Learning (DistRL), which learns the
return distribution, can obtain second-order bounds in both online and offline RL in general …

One-step distributional reinforcement learning

M Achab, R Alami, YAD Djilali, K Fedyanin… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement learning (RL) allows an agent interacting sequentially with an environment to
maximize its long-term expected return. In the distributional RL (DistrRL) paradigm, the …

Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation

Y Chen, X Zhang, S Wang, L Huang - arXiv preprint arXiv:2402.18159, 2024 - arxiv.org
In the realm of reinforcement learning (RL), accounting for risk is crucial for making
decisions under uncertainty, particularly in applications where safety and reliability are …

Value-Distributional Model-Based Reinforcement Learning

CE Luis, AG Bottero, J Vinogradska… - arXiv preprint arXiv …, 2023 - arxiv.org
Quantifying uncertainty about a policy's long-term performance is important to solve
sequential decision-making tasks. We study the problem from a model-based Bayesian …

Variance control for distributional reinforcement learning

Q Kuang, Z Zhu, L Zhang, F Zhou - arXiv preprint arXiv:2307.16152, 2023 - arxiv.org
Although distributional reinforcement learning (DRL) has been widely examined in the past
few years, very few studies investigate the validity of the obtained Q-function estimator in the …

The Kernel Density Integral Transformation

C McCarter - arXiv preprint arXiv:2309.10194, 2023 - arxiv.org
Feature preprocessing continues to play a critical role when applying machine learning and
statistical methods to tabular data. In this paper, we propose the use of the kernel density …

Distributional policy evaluation: a maximum entropy approach to representation learning

R Zamboni, AM Metelli… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract The Maximum Entropy (Max-Ent) framework has been effectively employed in a
variety of Reinforcement Learning (RL) tasks. In this paper, we first propose a novel Max-Ent …

Policy Evaluation in Distributional LQR (Extended Version)

Z Wang, Y Gao, S Wang, MM Zavlanos, A Abate… - arXiv preprint arXiv …, 2023 - arxiv.org
Distributional reinforcement learning (DRL) enhances the understanding of the effects of the
randomness in the environment by letting agents learn the distribution of a random return …