Online bootstrap inference for policy evaluation in reinforcement learning

P Ramprasad, Y Li, Z Yang, Z Wang… - Journal of the …, 2023 - Taylor & Francis
The recent emergence of reinforcement learning (RL) has created a demand for robust
statistical inference methods for the parameter estimates computed using these algorithms …

Langevin thompson sampling with logarithmic communication: bandits and reinforcement learning

A Karbasi, NL Kuang, Y Ma… - … Conference on Machine …, 2023 - proceedings.mlr.press
Thompson sampling (TS) is widely used in sequential decision making due to its ease of use
and appealing empirical performance. However, many existing analytical and empirical …

Constrained reinforcement learning in hard exploration problems

P Pankayaraj, P Varakantham - … of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
One approach to guaranteeing safety in Reinforcement Learning is through cost constraints
that are dependent on the policy. Recent works in constrained RL have developed methods …

Local advantage networks for cooperative multi-agent reinforcement learning

R Avalos, M Reymond, A Nowé, DM Roijers - arXiv preprint arXiv …, 2021 - arxiv.org
Many recent successful off-policy multi-agent reinforcement learning (MARL) algorithms for
cooperative partially observable environments focus on finding factorized value functions …

Stateful offline contextual policy evaluation and learning

N Kallus, A Zhou - International Conference on Artificial …, 2022 - proceedings.mlr.press
We study off-policy evaluation and learning from sequential data in a structured class of
Markov decision processes that arise from repeated interactions with an exogenous …

Distillation of rl policies with formal guarantees via variational abstraction of markov decision processes

F Delgrange, A Nowé, GA Pérez - … of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org
We consider the challenge of policy simplification and verification in the context of policies
learned through reinforcement learning (RL) in continuous environments. In well-behaved …

Learning to Reach Goals via Diffusion

V Jain, S Ravanbakhsh - arXiv preprint arXiv:2310.02505, 2023 - arxiv.org
Diffusion models are a powerful class of generative models capable of mapping random
noise in high-dimensional spaces to a target manifold through iterative denoising. In this …

Wasserstein auto-encoded MDPs: Formal verification of efficiently distilled RL policies with many-sided guarantees

F Delgrange, A Nowe, GA Pérez - arXiv preprint arXiv:2303.12558, 2023 - arxiv.org
Although deep reinforcement learning (DRL) has many success stories, the large-scale
deployment of policies learned through these advanced techniques in safety-critical …

Off-policy actor-critic with emphatic weightings

E Graves, E Imani, R Kumaraswamy, M White - Journal of Machine …, 2023 - jmlr.org
A variety of theoretically-sound policy gradient algorithms exist for the on-policy setting due
to the policy gradient theorem, which provides a simplified form for the gradient. The off …

Client selection for federated policy optimization with environment heterogeneity

Z Xie, SH Song - arXiv preprint arXiv:2305.10978, 2023 - arxiv.org
The development of Policy Iteration (PI) has inspired many recent algorithms for
Reinforcement Learning (RL), including several policy gradient methods that gained both …