Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Motif: Intrinsic motivation from artificial intelligence feedback

M Klissarov, P D'Oro, S Sodhani, R Raileanu… - arXiv preprint arXiv …, 2023 - arxiv.org
Exploring rich environments and evaluating one's actions without prior knowledge is
immensely challenging. In this paper, we propose Motif, a general method to interface such …

diff History for Long-Context Language Agents

U Piterbarg, L Pinto, R Fergus - arXiv preprint arXiv:2312.07540, 2023 - arxiv.org
Language Models (LMs) offer an exciting solution for general-purpose embodied control.
However, a key technical issue arises when using an LM-based controller: environment …

Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem

M Wołczyk, B Cupiał, M Ostaszewski… - arXiv preprint arXiv …, 2024 - arxiv.org
Fine-tuning is a widespread technique that allows practitioners to transfer pre-trained
capabilities, as recently showcased by the successful applications of foundation models …

Syllabus: Portable Curricula for Reinforcement Learning Agents

R Sullivan, R Pégoud, AU Rahmen, X Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
Curriculum learning has been a quiet yet crucial component of many of the high-profile
successes of reinforcement learning. Despite this, none of the major reinforcement learning …

AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws

O Neumann, C Gros - arXiv preprint arXiv:2412.11979, 2024 - arxiv.org
Neural scaling laws are observed in a range of domains, to date with no clear understanding
of why they occur. Recent theories suggest that loss power laws arise from Zipf's law, a …

diff History for Neural Language Agents

U Piterbarg, L Pinto, R Fergus - Forty-first International Conference on … - openreview.net
Neural Language Models (LMs) offer an exciting solution for general-purpose embodied
control. However, a key technical issue arises when using an LM-based controller …

[PDF][PDF] Enhancing PPO with Intrinsic Rewards: A Study in the NetHack Environment

P Yanushonak - 2024 - dspace.cuni.cz
This thesis evaluates the effectiveness of various intrinsic reward mechanisms in enhancing
the performance of Asynchronous Proximal Policy Optimization (APPO) in a complex, sparse …