Principle-driven self-alignment of language models from scratch with minimal human supervision

Z Sun, Y Shen, Q Zhou, H Zhang… - Advances in …, 2024 - proceedings.neurips.cc
Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning
(SFT) with human annotations and reinforcement learning from human feedback (RLHF) to …

Direct language model alignment from online ai feedback

S Guo, B Zhang, T Liu, T Liu, M Khalman… - arXiv preprint arXiv …, 2024 - arxiv.org
Direct alignment from preferences (DAP) methods, such as DPO, have recently emerged as
efficient alternatives to reinforcement learning from human feedback (RLHF), that do not …

Multilingual large language model: A survey of resources, taxonomy and frontiers

L Qin, Q Chen, Y Zhou, Z Chen, Y Li, L Liao… - arXiv preprint arXiv …, 2024 - arxiv.org
Multilingual Large Language Models are capable of using powerful Large Language
Models to handle and respond to queries in multiple languages, which achieves remarkable …

Adversarial preference optimization

P Cheng, Y Yang, J Li, Y Dai, N Du - arXiv preprint arXiv:2311.08045, 2023 - arxiv.org
Human preference alignment is a crucial training step to improve the interaction quality of
large language models (LLMs). Existing aligning methods depend on manually annotated …

Human-instruction-free llm self-alignment with limited samples

H Guo, Y Yao, W Shen, J Wei, X Zhang, Z Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Aligning large language models (LLMs) with human values is a vital task for LLM
practitioners. Current alignment techniques have several limitations:(1) requiring a large …

Self-alignment of large language models via monopolylogue-based social scene simulation

X Pang, S Tang, R Ye, Y Xiong, B Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Aligning large language models (LLMs) with human values is imperative to mitigate
potential adverse effects resulting from their misuse. Drawing from the sociological insight …

Self-supervised alignment with mutual information: Learning to follow principles without preference labels

JP Fränken, E Zelikman, R Rafailov, K Gandhi… - arXiv preprint arXiv …, 2024 - arxiv.org
When prompting a language model (LM), users frequently expect the model to adhere to a
set of behavioral principles across diverse tasks, such as producing insightful content while …

The alignment ceiling: Objective mismatch in reinforcement learning from human feedback

N Lambert, R Calandra - arXiv preprint arXiv:2311.00168, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique
to make large language models (LLMs) more capable in complex settings. RLHF proceeds …

Ask Optimal Questions: Aligning Large Language Models with Retriever's Preference in Conversational Search

C Yoon, G Kim, B Jeon, S Kim, Y Jo, J Kang - arXiv preprint arXiv …, 2024 - arxiv.org
Conversational search, unlike single-turn retrieval tasks, requires understanding the current
question within a dialogue context. The common approach of rewrite-then-retrieve aims to …

Direct large language model alignment through self-rewarding contrastive prompt distillation

A Liu, H Bai, Z Lu, X Kong, S Wang, J Shan… - arXiv preprint arXiv …, 2024 - arxiv.org
Aligning large language models (LLMs) with human expectations without human-annotated
preference data is an important problem. In this paper, we propose a method to evaluate the …