Steady state analysis of episodic reinforcement learning

P Ramprasad, Y Li, Z Yang, Z Wang… - Journal of the …, 2023 - Taylor & Francis

The recent emergence of reinforcement learning (RL) has created a demand for robust
statistical inference methods for the parameter estimates computed using these algorithms …

被引用次数：33 相关文章所有 9 个版本

[PDF] mlr.press

Langevin thompson sampling with logarithmic communication: bandits and reinforcement learning

A Karbasi, NL Kuang, Y Ma… - … Conference on Machine …, 2023 - proceedings.mlr.press

Thompson sampling (TS) is widely used in sequential decision making due to its ease of use
and appealing empirical performance. However, many existing analytical and empirical …

被引用次数：6 相关文章所有 7 个版本

[PDF] aaai.org

Constrained reinforcement learning in hard exploration problems

P Pankayaraj, P Varakantham - … of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org

One approach to guaranteeing safety in Reinforcement Learning is through cost constraints
that are dependent on the policy. Recent works in constrained RL have developed methods …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Local advantage networks for cooperative multi-agent reinforcement learning

R Avalos, M Reymond, A Nowé, DM Roijers - arXiv preprint arXiv …, 2021 - arxiv.org

Many recent successful off-policy multi-agent reinforcement learning (MARL) algorithms for
cooperative partially observable environments focus on finding factorized value functions …

被引用次数：12 相关文章所有 9 个版本

[PDF] mlr.press

Stateful offline contextual policy evaluation and learning

N Kallus, A Zhou - International Conference on Artificial …, 2022 - proceedings.mlr.press

We study off-policy evaluation and learning from sequential data in a structured class of
Markov decision processes that arise from repeated interactions with an exogenous …

被引用次数：9 相关文章所有 6 个版本

[PDF] aaai.org

Distillation of rl policies with formal guarantees via variational abstraction of markov decision processes

F Delgrange, A Nowé, GA Pérez - … of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org

We consider the challenge of policy simplification and verification in the context of policies
learned through reinforcement learning (RL) in continuous environments. In well-behaved …

被引用次数：9 相关文章所有 12 个版本

[PDF] arxiv.org

Learning to Reach Goals via Diffusion

V Jain, S Ravanbakhsh - arXiv preprint arXiv:2310.02505, 2023 - arxiv.org

Diffusion models are a powerful class of generative models capable of mapping random
noise in high-dimensional spaces to a target manifold through iterative denoising. In this …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Wasserstein auto-encoded MDPs: Formal verification of efficiently distilled RL policies with many-sided guarantees

F Delgrange, A Nowe, GA Pérez - arXiv preprint arXiv:2303.12558, 2023 - arxiv.org

Although deep reinforcement learning (DRL) has many success stories, the large-scale
deployment of policies learned through these advanced techniques in safety-critical …

被引用次数：5 相关文章所有 6 个版本

[PDF] jmlr.org

Off-policy actor-critic with emphatic weightings

E Graves, E Imani, R Kumaraswamy, M White - Journal of Machine …, 2023 - jmlr.org

A variety of theoretically-sound policy gradient algorithms exist for the on-policy setting due
to the policy gradient theorem, which provides a simplified form for the gradient. The off …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Client selection for federated policy optimization with environment heterogeneity

Z Xie, SH Song - arXiv preprint arXiv:2305.10978, 2023 - arxiv.org

The development of Policy Iteration (PI) has inspired many recent algorithms for
Reinforcement Learning (RL), including several policy gradient methods that gained both …

被引用次数：1 相关文章所有 2 个版本

高级搜索

QQ 群