A review of safe reinforcement learning: Methods, theory and applications

S Gu, L Yang, Y Du, G Chen, F Walter, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

Natural policy gradient primal-dual method for constrained markov decision processes

D Ding, K Zhang, T Basar… - Advances in Neural …, 2020 - proceedings.neurips.cc
We study sequential decision-making problems in which each agent aims to maximize the
expected total reward while satisfying a constraint on the expected total utility. We employ …

Probable domain generalization via quantile risk minimization

C Eastwood, A Robey, S Singh… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Domain generalization (DG) seeks predictors which perform well on unseen test
distributions by leveraging data drawn from multiple related training distributions or …

Crpo: A new approach for safe reinforcement learning with convergence guarantee

T Xu, Y Liang, G Lan - International Conference on Machine …, 2021 - proceedings.mlr.press
In safe reinforcement learning (SRL) problems, an agent explores the environment to
maximize an expected total reward and meanwhile avoids violation of certain constraints on …

Conservative safety critics for exploration

H Bharadhwaj, A Kumar, N Rhinehart, S Levine… - arXiv preprint arXiv …, 2020 - arxiv.org
Safe exploration presents a major challenge in reinforcement learning (RL): when active
data collection requires deploying partially trained policies, we must ensure that these …

Challenges and opportunities of machine learning control in building operations

L Zhang, Z Chen, X Zhang, A Pertzborn, X Jin - Building Simulation, 2023 - Springer
Abstract Machine learning control (MLC) is a highly flexible and adaptable method that
enables the design, modeling, tuning, and maintenance of building controllers to be more …

Provably efficient safe exploration via primal-dual policy optimization

D Ding, X Wei, Z Yang, Z Wang… - … conference on artificial …, 2021 - proceedings.mlr.press
We study the safe reinforcement learning problem using the constrained Markov decision
processes in which an agent aims to maximize the expected total reward subject to a safety …

Learning policies with zero or bounded constraint violation for constrained mdps

T Liu, R Zhou, D Kalathil, P Kumar… - Advances in Neural …, 2021 - proceedings.neurips.cc
We address the issue of safety in reinforcement learning. We pose the problem in an
episodic framework of a constrained Markov decision process. Existing results have shown …

Last-iterate convergent policy gradient primal-dual methods for constrained mdps

D Ding, CY Wei, K Zhang… - Advances in Neural …, 2024 - proceedings.neurips.cc
We study the problem of computing an optimal policy of an infinite-horizon discounted
constrained Markov decision process (constrained MDP). Despite the popularity of …

Decentralized policy gradient descent ascent for safe multi-agent reinforcement learning

S Lu, K Zhang, T Chen, T Başar, L Horesh - Proceedings of the AAAI …, 2021 - ojs.aaai.org
This paper deals with distributed reinforcement learning problems with safety constraints. In
particular, we consider that a team of agents cooperate in a shared environment, where …