相关文章- 学术资源搜索

Principle-driven self-alignment of language models from scratch with minimal human supervision

Z Sun, Y Shen, Q Zhou, H Zhang… - Advances in …, 2024 - proceedings.neurips.cc

Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning
(SFT) with human annotations and reinforcement learning from human feedback (RLHF) to …

被引用次数：200 相关文章所有 8 个版本

[PDF] arxiv.org

Salmon: Self-alignment with principle-following reward models

Z Sun, Y Shen, H Zhang, Q Zhou, Z Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

Supervised Fine-Tuning (SFT) on response demonstrations combined with Reinforcement
Learning from Human Feedback (RLHF) constitutes a powerful paradigm for aligning LLM …

被引用次数：29 相关文章所有 2 个版本

[PDF] arxiv.org

Self-alignment with instruction backtranslation

X Li, P Yu, C Zhou, T Schick, L Zettlemoyer… - arXiv preprint arXiv …, 2023 - arxiv.org

We present a scalable method to build a high quality instruction following language model
by automatically labelling human-written text with corresponding instructions. Our approach …

被引用次数：109 相关文章所有 4 个版本

[PDF] neurips.cc

Openassistant conversations-democratizing large language model alignment

A Köpf, Y Kilcher, D von Rütte… - Advances in …, 2024 - proceedings.neurips.cc

Aligning large language models (LLMs) with human preferences has proven to drastically
improve usability and has driven rapid adoption as demonstrated by ChatGPT. Alignment …

被引用次数：318 相关文章所有 6 个版本

[PDF] arxiv.org

Feature adaptation of pre-trained language models across languages and domains with robust self-training

H Ye, Q Tan, R He, J Li, HT Ng, L Bing - arXiv preprint arXiv:2009.11538, 2020 - arxiv.org

Adapting pre-trained language models (PrLMs)(eg, BERT) to new domains has gained
much attention recently. Instead of fine-tuning PrLMs as done in most previous work, we …

被引用次数：41 相关文章所有 5 个版本

[PDF] neurips.cc

Lima: Less is more for alignment

C Zhou, P Liu, P Xu, S Iyer, J Sun… - Advances in …, 2024 - proceedings.neurips.cc

Large language models are trained in two stages:(1) unsupervised pretraining from raw text,
to learn general-purpose representations, and (2) large scale instruction tuning and …

被引用次数：508 相关文章所有 5 个版本

[PDF] arxiv.org

Generative judge for evaluating alignment

J Li, S Sun, W Yuan, RZ Fan, H Zhao, P Liu - arXiv preprint arXiv …, 2023 - arxiv.org

The rapid development of Large Language Models (LLMs) has substantially expanded the
range of tasks they can address. In the field of Natural Language Processing (NLP) …

被引用次数：37 相关文章所有 3 个版本

[PDF] arxiv.org

Aligning large language models through synthetic feedback

S Kim, S Bae, J Shin, S Kang, D Kwak, KM Yoo… - arXiv preprint arXiv …, 2023 - arxiv.org

Aligning large language models (LLMs) to human values has become increasingly
important as it enables sophisticated steering of LLMs. However, it requires significant …

被引用次数：33 相关文章所有 4 个版本

[PDF] arxiv.org

Secrets of rlhf in large language models part i: Ppo

R Zheng, S Dou, S Gao, Y Hua, W Shen… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) have formulated a blueprint for the advancement of artificial
general intelligence. Its primary objective is to function as a human-centric (helpful, honest …

被引用次数：53 相关文章所有 4 个版本

[PDF] openreview.net

The unlocking spell on base llms: Rethinking alignment via in-context learning

BY Lin, A Ravichander, X Lu, N Dziri… - The Twelfth …, 2023 - openreview.net

Alignment tuning has become the de facto standard practice for enabling base large
language models (LLMs) to serve as open-domain AI assistants. The alignment tuning …

被引用次数：57 相关文章所有 3 个版本

高级搜索

QQ 群

Principle-driven self-alignment of language models from scratch with minimal human supervision

Salmon: Self-alignment with principle-following reward models

Self-alignment with instruction backtranslation

Openassistant conversations-democratizing large language model alignment

Feature adaptation of pre-trained language models across languages and domains with robust self-training

Lima: Less is more for alignment

Generative judge for evaluating alignment

Aligning large language models through synthetic feedback

Secrets of rlhf in large language models part i: Ppo

The unlocking spell on base llms: Rethinking alignment via in-context learning

引用