Safe policies for reinforcement learning via primal-dual methods

S Gu, L Yang, Y Du, G Chen, F Walter, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

被引用次数：256 相关文章所有 2 个版本

[PDF] neurips.cc

Natural policy gradient primal-dual method for constrained markov decision processes

D Ding, K Zhang, T Basar… - Advances in Neural …, 2020 - proceedings.neurips.cc

We study sequential decision-making problems in which each agent aims to maximize the
expected total reward while satisfying a constraint on the expected total utility. We employ …

被引用次数：207 相关文章所有 8 个版本

[PDF] neurips.cc

Probable domain generalization via quantile risk minimization

C Eastwood, A Robey, S Singh… - Advances in …, 2022 - proceedings.neurips.cc

Abstract Domain generalization (DG) seeks predictors which perform well on unseen test
distributions by leveraging data drawn from multiple related training distributions or …

被引用次数：54 相关文章所有 6 个版本

[PDF] mlr.press

Crpo: A new approach for safe reinforcement learning with convergence guarantee

T Xu, Y Liang, G Lan - International Conference on Machine …, 2021 - proceedings.mlr.press

In safe reinforcement learning (SRL) problems, an agent explores the environment to
maximize an expected total reward and meanwhile avoids violation of certain constraints on …

被引用次数：140 相关文章所有 7 个版本

[PDF] arxiv.org

Conservative safety critics for exploration

H Bharadhwaj, A Kumar, N Rhinehart, S Levine… - arXiv preprint arXiv …, 2020 - arxiv.org

Safe exploration presents a major challenge in reinforcement learning (RL): when active
data collection requires deploying partially trained policies, we must ensure that these …

被引用次数：143 相关文章所有 3 个版本

[PDF] osti.gov

Challenges and opportunities of machine learning control in building operations

L Zhang, Z Chen, X Zhang, A Pertzborn, X Jin - Building Simulation, 2023 - Springer

Abstract Machine learning control (MLC) is a highly flexible and adaptable method that
enables the design, modeling, tuning, and maintenance of building controllers to be more …

被引用次数：12 相关文章所有 8 个版本

[PDF] mlr.press

Provably efficient safe exploration via primal-dual policy optimization

D Ding, X Wei, Z Yang, Z Wang… - … conference on artificial …, 2021 - proceedings.mlr.press

We study the safe reinforcement learning problem using the constrained Markov decision
processes in which an agent aims to maximize the expected total reward subject to a safety …

被引用次数：179 相关文章所有 9 个版本

[PDF] neurips.cc

Learning policies with zero or bounded constraint violation for constrained mdps

T Liu, R Zhou, D Kalathil, P Kumar… - Advances in Neural …, 2021 - proceedings.neurips.cc

We address the issue of safety in reinforcement learning. We pose the problem in an
episodic framework of a constrained Markov decision process. Existing results have shown …

被引用次数：84 相关文章所有 8 个版本

[PDF] neurips.cc

Last-iterate convergent policy gradient primal-dual methods for constrained mdps

D Ding, CY Wei, K Zhang… - Advances in Neural …, 2024 - proceedings.neurips.cc

We study the problem of computing an optimal policy of an infinite-horizon discounted
constrained Markov decision process (constrained MDP). Despite the popularity of …

被引用次数：22 相关文章所有 6 个版本

[PDF] aaai.org

Decentralized policy gradient descent ascent for safe multi-agent reinforcement learning

S Lu, K Zhang, T Chen, T Başar, L Horesh - Proceedings of the AAAI …, 2021 - ojs.aaai.org

This paper deals with distributed reinforcement learning problems with safety constraints. In
particular, we consider that a team of agents cooperate in a shared environment, where …

被引用次数：67 相关文章所有 10 个版本

高级搜索

QQ 群