Principle-driven self-alignment of language models from scratch with minimal human supervision

Z Sun, Y Shen, Q Zhou, H Zhang… - Advances in …, 2024 - proceedings.neurips.cc
Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning
(SFT) with human annotations and reinforcement learning from human feedback (RLHF) to …

Salmon: Self-alignment with principle-following reward models

Z Sun, Y Shen, H Zhang, Q Zhou, Z Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
Supervised Fine-Tuning (SFT) on response demonstrations combined with Reinforcement
Learning from Human Feedback (RLHF) constitutes a powerful paradigm for aligning LLM …

Self-alignment with instruction backtranslation

X Li, P Yu, C Zhou, T Schick, L Zettlemoyer… - arXiv preprint arXiv …, 2023 - arxiv.org
We present a scalable method to build a high quality instruction following language model
by automatically labelling human-written text with corresponding instructions. Our approach …

Openassistant conversations-democratizing large language model alignment

A Köpf, Y Kilcher, D von Rütte… - Advances in …, 2024 - proceedings.neurips.cc
Aligning large language models (LLMs) with human preferences has proven to drastically
improve usability and has driven rapid adoption as demonstrated by ChatGPT. Alignment …

Feature adaptation of pre-trained language models across languages and domains with robust self-training

H Ye, Q Tan, R He, J Li, HT Ng, L Bing - arXiv preprint arXiv:2009.11538, 2020 - arxiv.org
Adapting pre-trained language models (PrLMs)(eg, BERT) to new domains has gained
much attention recently. Instead of fine-tuning PrLMs as done in most previous work, we …

Lima: Less is more for alignment

C Zhou, P Liu, P Xu, S Iyer, J Sun… - Advances in …, 2024 - proceedings.neurips.cc
Large language models are trained in two stages:(1) unsupervised pretraining from raw text,
to learn general-purpose representations, and (2) large scale instruction tuning and …

Generative judge for evaluating alignment

J Li, S Sun, W Yuan, RZ Fan, H Zhao, P Liu - arXiv preprint arXiv …, 2023 - arxiv.org
The rapid development of Large Language Models (LLMs) has substantially expanded the
range of tasks they can address. In the field of Natural Language Processing (NLP) …

Aligning large language models through synthetic feedback

S Kim, S Bae, J Shin, S Kang, D Kwak, KM Yoo… - arXiv preprint arXiv …, 2023 - arxiv.org
Aligning large language models (LLMs) to human values has become increasingly
important as it enables sophisticated steering of LLMs. However, it requires significant …

Secrets of rlhf in large language models part i: Ppo

R Zheng, S Dou, S Gao, Y Hua, W Shen… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have formulated a blueprint for the advancement of artificial
general intelligence. Its primary objective is to function as a human-centric (helpful, honest …

The unlocking spell on base llms: Rethinking alignment via in-context learning

BY Lin, A Ravichander, X Lu, N Dziri… - The Twelfth …, 2023 - openreview.net
Alignment tuning has become the de facto standard practice for enabling base large
language models (LLMs) to serve as open-domain AI assistants. The alignment tuning …