A Karbasi, NL Kuang, Y Ma… - … Conference on Machine …, 2023 - proceedings.mlr.press
Thompson sampling (TS) is widely used in sequential decision making due to its ease of use and appealing empirical performance. However, many existing analytical and empirical …
One approach to guaranteeing safety in Reinforcement Learning is through cost constraints that are dependent on the policy. Recent works in constrained RL have developed methods …
N Kallus, A Zhou - International Conference on Artificial …, 2022 - proceedings.mlr.press
We study off-policy evaluation and learning from sequential data in a structured class of Markov decision processes that arise from repeated interactions with an exogenous …
We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved …
Diffusion models are a powerful class of generative models capable of mapping random noise in high-dimensional spaces to a target manifold through iterative denoising. In this …
Although deep reinforcement learning (DRL) has many success stories, the large-scale deployment of policies learned through these advanced techniques in safety-critical …
A variety of theoretically-sound policy gradient algorithms exist for the on-policy setting due to the policy gradient theorem, which provides a simplified form for the gradient. The off …
Z Xie, SH Song - arXiv preprint arXiv:2305.10978, 2023 - arxiv.org
The development of Policy Iteration (PI) has inspired many recent algorithms for Reinforcement Learning (RL), including several policy gradient methods that gained both …