Aligning large language models through synthetic feedback

S Casper, X Davies, C Shi, TK Gilbert… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

被引用次数：216 相关文章所有 6 个版本

[PDF] arxiv.org

Evaluating instruction-tuned large language models on code comprehension and generation

Z Yuan, J Liu, Q Zi, M Liu, X Peng, Y Lou - arXiv preprint arXiv:2308.01240, 2023 - arxiv.org

In this work, we evaluate 10 open-source instructed LLMs on four representative code
comprehension and generation tasks. We have the following main findings. First, for the zero …

被引用次数：35 相关文章所有 4 个版本

[PDF] arxiv.org

Salmon: Self-alignment with principle-following reward models

Z Sun, Y Shen, H Zhang, Q Zhou, Z Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

Supervised Fine-Tuning (SFT) on response demonstrations combined with Reinforcement
Learning from Human Feedback (RLHF) constitutes a powerful paradigm for aligning LLM …

被引用次数：27 相关文章所有 3 个版本

[PDF] arxiv.org

Flask: Fine-grained language model evaluation based on alignment skill sets

S Ye, D Kim, S Kim, H Hwang, S Kim, Y Jo… - arXiv preprint arXiv …, 2023 - arxiv.org

Evaluation of Large Language Models (LLMs) is challenging because aligning to human
values requires the composition of multiple skills and the required set of skills varies …

被引用次数：34 相关文章所有 4 个版本

[PDF] arxiv.org

Large language model alignment: A survey

T Shen, R Jin, Y Huang, C Liu, W Dong, Z Guo… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent years have witnessed remarkable progress made in large language models (LLMs).
Such advancements, while garnering significant attention, have concurrently elicited various …

被引用次数：53 相关文章所有 2 个版本

[PDF] arxiv.org

Personalized soups: Personalized large language model alignment via post-hoc parameter merging

J Jang, S Kim, BY Lin, Y Wang, J Hessel… - arXiv preprint arXiv …, 2023 - arxiv.org

While Reinforcement Learning from Human Feedback (RLHF) aligns Large Language
Models (LLMs) with general, aggregate human preferences, it is suboptimal for learning …

被引用次数：39 相关文章所有 2 个版本

[PDF] arxiv.org

From Instructions to Intrinsic Human Values--A Survey of Alignment Goals for Big Models

J Yao, X Yi, X Wang, J Wang, X Xie - arXiv preprint arXiv:2308.12014, 2023 - arxiv.org

Big models, exemplified by Large Language Models (LLMs), are models typically pre-
trained on massive data and comprised of enormous parameters, which not only obtain …

被引用次数：23 相关文章所有 2 个版本

[PDF] arxiv.org

Evallm: Interactive evaluation of large language model prompts on user-defined criteria

TS Kim, Y Lee, J Shin, YH Kim, J Kim - arXiv preprint arXiv:2309.13633, 2023 - arxiv.org

By simply composing prompts, developers can prototype novel generative applications with
Large Language Models (LLMs). To refine prototypes into products, however, developers …

被引用次数：18 相关文章所有 2 个版本

[PDF] arxiv.org

A survey on knowledge distillation of large language models

X Xu, M Li, C Tao, T Shen, R Cheng, J Li, C Xu… - arXiv preprint arXiv …, 2024 - arxiv.org

This survey presents an in-depth exploration of knowledge distillation (KD) techniques
within the realm of Large Language Models (LLMs), spotlighting the pivotal role of KD in …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

On the limitations of simulating active learning

K Margatina, N Aletras - arXiv preprint arXiv:2305.13342, 2023 - arxiv.org

Active learning (AL) is a human-and-model-in-the-loop paradigm that iteratively selects
informative unlabeled data for human annotation, aiming to improve over random sampling …

被引用次数：7 相关文章所有 3 个版本

高级搜索

QQ 群