Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications

TT Nguyen, ND Nguyen… - IEEE transactions on …, 2020 - ieeexplore.ieee.org
Reinforcement learning (RL) algorithms have been around for decades and employed to
solve various sequential decision-making problems. These algorithms, however, have faced …

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

Eureka: Human-level reward design via coding large language models

YJ Ma, W Liang, G Wang, DA Huang, O Bastani… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have excelled as high-level semantic planners for
sequential decision-making tasks. However, harnessing them to learn complex low-level …

Defining and characterizing reward gaming

J Skalse, N Howe… - Advances in Neural …, 2022 - proceedings.neurips.cc
We provide the first formal definition of\textbf {reward hacking}, a phenomenon where
optimizing an imperfect proxy reward function, $\mathcal {\tilde {R}} $, leads to poor …

Multi-agent deep reinforcement learning: a survey

S Gronauer, K Diepold - Artificial Intelligence Review, 2022 - Springer
The advances in reinforcement learning have recorded sublime success in various domains.
Although the multi-agent domain has been overshadowed by its single-agent counterpart …

Reward design with language models

M Kwon, SM Xie, K Bullard, D Sadigh - arXiv preprint arXiv:2303.00001, 2023 - arxiv.org
Reward design in reinforcement learning (RL) is challenging since specifying human
notions of desired behavior may be difficult via reward functions or require many expert …

Roboclip: One demonstration is enough to learn robot policies

S Sontakke, J Zhang, S Arnold… - Advances in …, 2024 - proceedings.neurips.cc
Reward specification is a notoriously difficult problem in reinforcement learning, requiring
extensive expert supervision to design robust reward functions. Imitation learning (IL) …

Survey on large language model-enhanced reinforcement learning: Concept, taxonomy, and methods

Y Cao, H Zhao, Y Cheng, T Shu, Y Chen… - … on Neural Networks …, 2024 - ieeexplore.ieee.org
With extensive pretrained knowledge and high-level general capabilities, large language
models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in …

Challenges of real-world reinforcement learning

G Dulac-Arnold, D Mankowitz, T Hester - arXiv preprint arXiv:1904.12901, 2019 - arxiv.org
Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is
beginning to show some successes in real-world scenarios. However, much of the research …