Constrained reinforcement learning has zero duality gap

S Gu, L Yang, Y Du, G Chen, F Walter, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

被引用次数：272 相关文章所有 2 个版本

[PDF] mlr.press

Responsive safety in reinforcement learning by pid lagrangian methods

A Stooke, J Achiam, P Abbeel - International Conference on …, 2020 - proceedings.mlr.press

Lagrangian methods are widely used algorithms for constrained optimization problems, but
their learning dynamics exhibit oscillations and overshoot which, when applied to safe …

被引用次数：298 相关文章所有 5 个版本

[PDF] neurips.cc

Natural policy gradient primal-dual method for constrained markov decision processes

D Ding, K Zhang, T Basar… - Advances in Neural …, 2020 - proceedings.neurips.cc

We study sequential decision-making problems in which each agent aims to maximize the
expected total reward while satisfying a constraint on the expected total utility. We employ …

被引用次数：213 相关文章所有 8 个版本

[PDF] mlr.press

Crpo: A new approach for safe reinforcement learning with convergence guarantee

T Xu, Y Liang, G Lan - International Conference on Machine …, 2021 - proceedings.mlr.press

In safe reinforcement learning (SRL) problems, an agent explores the environment to
maximize an expected total reward and meanwhile avoids violation of certain constraints on …

被引用次数：147 相关文章所有 7 个版本

[PDF] arxiv.org

Conservative safety critics for exploration

H Bharadhwaj, A Kumar, N Rhinehart, S Levine… - arXiv preprint arXiv …, 2020 - arxiv.org

Safe exploration presents a major challenge in reinforcement learning (RL): when active
data collection requires deploying partially trained policies, we must ensure that these …

被引用次数：146 相关文章所有 3 个版本

[PDF] mlr.press

Provably efficient safe exploration via primal-dual policy optimization

D Ding, X Wei, Z Yang, Z Wang… - … conference on artificial …, 2021 - proceedings.mlr.press

We study the safe reinforcement learning problem using the constrained Markov decision
processes in which an agent aims to maximize the expected total reward subject to a safety …

被引用次数：183 相关文章所有 9 个版本

[PDF] researchgate.net

Optimizing Long-Term Efficiency and Fairness in Ride-Hailing under Budget Constraint via Joint Order Dispatching and Driver Repositioning

J Sun, H Jin, Z Yang, L Su - IEEE Transactions on Knowledge …, 2024 - ieeexplore.ieee.org

Ride-hailing platforms (eg, Uber and Didi Chuxing) have become increasingly popular in
recent years. Efficiency has always been an important metric for such platforms. However …

被引用次数：2529 相关文章所有 24 个版本

[PDF] arxiv.org

Exploration-exploitation in constrained mdps

Y Efroni, S Mannor, M Pirotta - arXiv preprint arXiv:2003.02189, 2020 - arxiv.org

In many sequential decision-making problems, the goal is to optimize a utility function while
satisfying a set of constraints on different utilities. This learning problem is formalized …

被引用次数：175 相关文章所有 2 个版本

[PDF] neurips.cc

Constrained update projection approach to safe policy optimization

L Yang, J Ji, J Dai, L Zhang, B Zhou… - Advances in …, 2022 - proceedings.neurips.cc

Safe reinforcement learning (RL) studies problems where an intelligent agent has to not only
maximize reward but also avoid exploring unsafe areas. In this study, we propose CUP, a …

被引用次数：45 相关文章所有 9 个版本

[PDF] arxiv.org

Penalized proximal policy optimization for safe reinforcement learning

L Zhang, L Shen, L Yang, S Chen, B Yuan… - arXiv preprint arXiv …, 2022 - arxiv.org

Safe reinforcement learning aims to learn the optimal policy while satisfying safety
constraints, which is essential in real-world applications. However, current algorithms still …

被引用次数：64 相关文章所有 4 个版本

高级搜索

QQ 群