Fine-tuning language models to find agreement among humans with diverse preferences

M Bakker, M Chadwick, H Sheahan… - Advances in …, 2022 - proceedings.neurips.cc
Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with
the preferences of a prototypical user. This work assumes that human preferences are static …

Peering through preferences: Unraveling feedback acquisition for aligning large language models

H Bansal, J Dang, A Grover - arXiv preprint arXiv:2308.15812, 2023 - arxiv.org
Aligning large language models (LLMs) with human values and intents critically involves the
use of human or AI feedback. While dense feedback annotations are expensive to acquire …

Aligning large language models with human preferences through representation engineering

W Liu, X Wang, M Wu, T Li, C Lv, Z Ling, J Zhu… - arXiv preprint arXiv …, 2023 - arxiv.org
Aligning large language models (LLMs) with human preferences is crucial for enhancing
their utility in terms of helpfulness, truthfulness, safety, harmlessness, and interestingness …

Chatbot arena: An open platform for evaluating llms by human preference

WL Chiang, L Zheng, Y Sheng… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have unlocked new capabilities and applications; however,
evaluating the alignment with human preferences still poses significant challenges. To …

Aligning language models to user opinions

EJ Hwang, BP Majumder, N Tandon - arXiv preprint arXiv:2305.14929, 2023 - arxiv.org
An important aspect of developing LLMs that interact with humans is to align models'
behavior to their users. It is possible to prompt an LLM into behaving as a certain persona …

Whose opinions do language models reflect?

S Santurkar, E Durmus, F Ladhak… - International …, 2023 - proceedings.mlr.press
Abstract Language models (LMs) are increasingly being used in open-ended contexts,
where the opinions they reflect in response to subjective queries can have a profound …

Eliciting human preferences with language models

BZ Li, A Tamkin, N Goodman, J Andreas - arXiv preprint arXiv:2310.11589, 2023 - arxiv.org
Language models (LMs) can be directed to perform target tasks by using labeled examples
or natural language prompts. But selecting examples or writing prompts for can be …

Aligning large language models with human: A survey

Y Wang, W Zhong, L Li, F Mi, X Zeng, W Huang… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) trained on extensive textual corpora have emerged as
leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite …

Aligning language models with human preferences via a bayesian approach

J Wang, H Wang, S Sun, W Li - Advances in Neural …, 2024 - proceedings.neurips.cc
In the quest to advance human-centric natural language generation (NLG) systems,
ensuring alignment between NLG models and human preferences is crucial. For this …

Rain: Your language models can align themselves without finetuning

Y Li, F Wei, J Zhao, C Zhang, H Zhang - arXiv preprint arXiv:2309.07124, 2023 - arxiv.org
Large language models (LLMs) often demonstrate inconsistencies with human preferences.
Previous research gathered human preference data and then aligned the pre-trained …