Salmon: Self-alignment with principle-following reward models

Z Sun, Y Shen, Q Zhou, H Zhang… - Advances in …, 2024 - proceedings.neurips.cc

Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning
(SFT) with human annotations and reinforcement learning from human feedback (RLHF) to …

被引用次数：219 相关文章所有 8 个版本

[PDF] arxiv.org

Direct language model alignment from online ai feedback

S Guo, B Zhang, T Liu, T Liu, M Khalman… - arXiv preprint arXiv …, 2024 - arxiv.org

Direct alignment from preferences (DAP) methods, such as DPO, have recently emerged as
efficient alternatives to reinforcement learning from human feedback (RLHF), that do not …

被引用次数：38 相关文章所有 2 个版本

[PDF] arxiv.org

Multilingual large language model: A survey of resources, taxonomy and frontiers

L Qin, Q Chen, Y Zhou, Z Chen, Y Li, L Liao… - arXiv preprint arXiv …, 2024 - arxiv.org

Multilingual Large Language Models are capable of using powerful Large Language
Models to handle and respond to queries in multiple languages, which achieves remarkable …

被引用次数：21 相关文章所有 2 个版本

[PDF] arxiv.org

Adversarial preference optimization

P Cheng, Y Yang, J Li, Y Dai, N Du - arXiv preprint arXiv:2311.08045, 2023 - arxiv.org

Human preference alignment is a crucial training step to improve the interaction quality of
large language models (LLMs). Existing aligning methods depend on manually annotated …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

Human-instruction-free llm self-alignment with limited samples

H Guo, Y Yao, W Shen, J Wei, X Zhang, Z Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Aligning large language models (LLMs) with human values is a vital task for LLM
practitioners. Current alignment techniques have several limitations:(1) requiring a large …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Self-alignment of large language models via monopolylogue-based social scene simulation

X Pang, S Tang, R Ye, Y Xiong, B Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Aligning large language models (LLMs) with human values is imperative to mitigate
potential adverse effects resulting from their misuse. Drawing from the sociological insight …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Self-supervised alignment with mutual information: Learning to follow principles without preference labels

JP Fränken, E Zelikman, R Rafailov, K Gandhi… - arXiv preprint arXiv …, 2024 - arxiv.org

When prompting a language model (LM), users frequently expect the model to adhere to a
set of behavioral principles across diverse tasks, such as producing insightful content while …

被引用次数：5 相关文章所有 3 个版本

The alignment ceiling: Objective mismatch in reinforcement learning from human feedback

N Lambert, R Calandra - arXiv preprint arXiv:2311.00168, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique
to make large language models (LLMs) more capable in complex settings. RLHF proceeds …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Ask Optimal Questions: Aligning Large Language Models with Retriever's Preference in Conversational Search

C Yoon, G Kim, B Jeon, S Kim, Y Jo, J Kang - arXiv preprint arXiv …, 2024 - arxiv.org

Conversational search, unlike single-turn retrieval tasks, requires understanding the current
question within a dialogue context. The common approach of rewrite-then-retrieve aims to …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Direct large language model alignment through self-rewarding contrastive prompt distillation

A Liu, H Bai, Z Lu, X Kong, S Wang, J Shan… - arXiv preprint arXiv …, 2024 - arxiv.org

Aligning large language models (LLMs) with human expectations without human-annotated
preference data is an important problem. In this paper, we propose a method to evaluate the …

被引用次数：4 相关文章所有 2 个版本

高级搜索

QQ 群