Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite …
Aligning large language models (LLMs) to human values has become increasingly important as it enables sophisticated steering of LLMs. However, it requires significant …
Alignment with human preference is a desired property of large language models (LLMs). Currently, the main alignment approach is based on reinforcement learning from human …
Alignment has become a critical step for instruction-tuned Large Language Models (LLMs) to become helpful assistants. However, effective evaluation of alignment for emerging …
Large language models (LLMs) have shown impressive success in various applications. However, these models are often not well aligned with human intents, which calls for …
Aligning large language models (LLMs) with human values and intents critically involves the use of human or AI feedback. While dense feedback annotations are expensive to acquire …
Alignment tuning has become the de facto standard practice for enabling base large language models (LLMs) to serve as open-domain AI assistants. The alignment tuning …
Finetuning language models with reinforcement learning (RL), eg from human feedback (HF), is a prominent method for alignment. But optimizing against a reward model can …
Big models, exemplified by Large Language Models (LLMs), are models typically pre- trained on massive data and comprised of enormous parameters, which not only obtain …