Scaling laws for imitation learning in nethack

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org

This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

被引用次数：118 相关文章所有 3 个版本

[PDF] arxiv.org

Motif: Intrinsic motivation from artificial intelligence feedback

M Klissarov, P D'Oro, S Sodhani, R Raileanu… - arXiv preprint arXiv …, 2023 - arxiv.org

Exploring rich environments and evaluating one's actions without prior knowledge is
immensely challenging. In this paper, we propose Motif, a general method to interface such …

被引用次数：46 相关文章所有 6 个版本

[PDF] arxiv.org

diff History for Long-Context Language Agents

U Piterbarg, L Pinto, R Fergus - arXiv preprint arXiv:2312.07540, 2023 - arxiv.org

Language Models (LMs) offer an exciting solution for general-purpose embodied control.
However, a key technical issue arises when using an LM-based controller: environment …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem

M Wołczyk, B Cupiał, M Ostaszewski… - arXiv preprint arXiv …, 2024 - arxiv.org

Fine-tuning is a widespread technique that allows practitioners to transfer pre-trained
capabilities, as recently showcased by the successful applications of foundation models …

被引用次数：10 相关文章所有 3 个版本

[PDF] arxiv.org

Syllabus: Portable Curricula for Reinforcement Learning Agents

R Sullivan, R Pégoud, AU Rahmen, X Yang… - arXiv preprint arXiv …, 2024 - arxiv.org

Curriculum learning has been a quiet yet crucial component of many of the high-profile
successes of reinforcement learning. Despite this, none of the major reinforcement learning …

[PDF] arxiv.org

AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws

O Neumann, C Gros - arXiv preprint arXiv:2412.11979, 2024 - arxiv.org

Neural scaling laws are observed in a range of domains, to date with no clear understanding
of why they occur. Recent theories suggest that loss power laws arise from Zipf's law, a …

diff History for Neural Language Agents

U Piterbarg, L Pinto, R Fergus - Forty-first International Conference on … - openreview.net

Neural Language Models (LMs) offer an exciting solution for general-purpose embodied
control. However, a key technical issue arises when using an LM-based controller …

[PDF] cuni.cz

[PDF][PDF] Enhancing PPO with Intrinsic Rewards: A Study in the NetHack Environment

P Yanushonak - 2024 - dspace.cuni.cz

This thesis evaluates the effectiveness of various intrinsic reward mechanisms in enhancing
the performance of Asynchronous Proximal Policy Optimization (APPO) in a complex, sparse …

高级搜索

QQ 群