Training language models to follow instructions with human feedback

L Ouyang, J Wu, X Jiang, D Almeida… - Advances in neural …, 2022 - proceedings.neurips.cc
Making language models bigger does not inherently make them better at following a user's
intent. For example, large language models can generate outputs that are untruthful, toxic, or …

Chain of hindsight aligns language models with feedback

H Liu, C Sferrazza, P Abbeel - arXiv preprint arXiv:2302.02676, 2023 - arxiv.org
Learning from human preferences is important for language models to match human needs
and to align with human and social values. Prior works have achieved remarkable …

Toward Human Readable Prompt Tuning: Kubrick's The Shining is a good movie, and a good prompt too?

W Shi, X Han, H Gonen, A Holtzman, Y Tsvetkov… - arXiv preprint arXiv …, 2022 - arxiv.org
Large language models can perform new tasks in a zero-shot fashion, given natural
language prompts that specify the desired behavior. Such prompts are typically hand …

Black-box prompt optimization: Aligning large language models without model training

J Cheng, X Liu, K Zheng, P Ke, H Wang, Y Dong… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have shown impressive success in various applications.
However, these models are often not well aligned with human intents, which calls for …

Activation addition: Steering language models without optimization

A Turner, L Thiergart, D Udell, G Leech, U Mini… - arXiv preprint arXiv …, 2023 - arxiv.org
Reliably controlling the behavior of large language models (LLMs) is a pressing open
problem. Existing methods include supervised finetuning, reinforcement learning from …

Eliciting human preferences with language models

BZ Li, A Tamkin, N Goodman, J Andreas - arXiv preprint arXiv:2310.11589, 2023 - arxiv.org
Language models (LMs) can be directed to perform target tasks by using labeled examples
or natural language prompts. But selecting examples or writing prompts for can be …

Constitutionmaker: Interactively critiquing large language models by converting feedback into principles

S Petridis, BD Wedin, J Wexler, M Pushkarna… - Proceedings of the 29th …, 2024 - dl.acm.org
Large language model (LLM) prompting is a promising new approach for users to create
and customize their own chatbots. However, current methods for steering a chatbot's …

Star-gate: Teaching language models to ask clarifying questions

C Andukuri, JP Fränken, T Gerstenberg… - arXiv preprint arXiv …, 2024 - arxiv.org
When prompting language models to complete a task, users often leave important aspects
unsaid. While asking questions could resolve this ambiguity\citep [GATE;][]{li2023eliciting} …

Pretraining language models with human preferences

T Korbak, K Shi, A Chen, RV Bhalerao… - International …, 2023 - proceedings.mlr.press
Abstract Language models (LMs) are pretrained to imitate text from large and diverse
datasets that contain content that would violate human preferences if generated by an LM …

Large language models are human-level prompt engineers

Y Zhou, AI Muresanu, Z Han, K Paster, S Pitis… - arXiv preprint arXiv …, 2022 - arxiv.org
By conditioning on natural language instructions, large language models (LLMs) have
displayed impressive capabilities as general-purpose computers. However, task …