Safe rlhf: Safe reinforcement learning from human feedback

J Dai, X Pan, R Sun, J Ji, X Xu, M Liu, Y Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
With the development of large language models (LLMs), striking a balance between the
performance and safety of AI systems has never been more critical. However, the inherent …

Is sora a world simulator? a comprehensive survey on general world models and beyond

Z Zhu, X Wang, W Zhao, C Min, N Deng, M Dou… - arXiv preprint arXiv …, 2024 - arxiv.org
General world models represent a crucial pathway toward achieving Artificial General
Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual …

Games for Artificial Intelligence Research: A Review and Perspectives

C Hu, Y Zhao, Z Wang, H Du… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Games have been the perfect test-beds for artificial intelligence research for the
characteristics that widely exist in real-world scenarios. Learning and optimisation, decision …

PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family Models

J Ji, D Hong, B Zhang, B Chen, J Dai, B Zheng… - arXiv preprint arXiv …, 2024 - arxiv.org
In this work, we introduce the PKU-SafeRLHF dataset, designed to promote research on
safety alignment in large language models (LLMs). As a sibling project to SafeRLHF and …

Deep Policy Optimization with Temporal Logic Constraints

A Shah, C Voloshin, C Yang, A Verma… - arXiv preprint arXiv …, 2024 - arxiv.org
Temporal logics, such as linear temporal logic (LTL), offer a precise means of specifying
tasks for (deep) reinforcement learning (RL) agents. In our work, we consider the setting …

Energy Management for Hybrid Electric Vehicles Using Safe Hybrid-Action Reinforcement Learning

J Xu, Y Lin - Mathematics, 2024 - mdpi.com
Reinforcement learning has shown success in solving complex control problems, yet safety
remains paramount in engineering applications like energy management systems (EMS) …

Collaborative promotion: Achieving safety and task performance by integrating imitation reinforcement learning

C Zhang, X Zhang, H Zhang, F Zhu - Expert Systems with Applications, 2024 - Elsevier
Although the importance of safety is self-evident for artificial intelligence, like the two sides of
a coin, excessively focus on safety performance without considering task performance may …

Feasibility Consistent Representation Learning for Safe Reinforcement Learning

Z Cen, Y Yao, Z Liu, D Zhao - arXiv preprint arXiv:2405.11718, 2024 - arxiv.org
In the field of safe reinforcement learning (RL), finding a balance between satisfying safety
constraints and optimizing reward performance presents a significant challenge. A key …

Safe Multi-agent Reinforcement Learning with Natural Language Constraints

Z Wang, M Fang, T Tomilin, F Fang, Y Du - arXiv preprint arXiv …, 2024 - arxiv.org
The role of natural language constraints in Safe Multi-agent Reinforcement Learning
(MARL) is crucial, yet often overlooked. While Safe MARL has vast potential, especially in …

Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding

D Bethell, S Gerasimou, R Calinescu… - arXiv preprint arXiv …, 2024 - arxiv.org
Empowering safe exploration of reinforcement learning (RL) agents during training is a
critical impediment towards deploying RL agents in many real-world scenarios. Training RL …