相关文章- 学术资源搜索

Ultrafeedback: Boosting language models with high-quality feedback

G Cui, L Yuan, N Ding, G Yao, W Zhu, Y Ni… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) has become a pivot technique in
aligning large language models (LLMs) with human preferences. In RLHF practice …

被引用次数：112 相关文章所有 3 个版本

[PDF] arxiv.org

Rrhf: Rank responses to align language models with human feedback without tears

Z Yuan, H Yuan, C Tan, W Wang, S Huang… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large
language models with human preferences, significantly enhancing the quality of interactions …

被引用次数：176 相关文章所有 2 个版本

[PDF] neurips.cc

RRHF: Rank responses to align language models with human feedback

H Yuan, Z Yuan, C Tan, W Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of
large language models with human preferences, significantly enhancing the quality of …

被引用次数：22 相关文章所有 3 个版本

[PDF] arxiv.org

Rlaif: Scaling reinforcement learning from human feedback with ai feedback

H Lee, S Phatale, H Mansoor, K Lu, T Mesnard… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) has proven effective in aligning large
language models (LLMs) with human preferences. However, gathering high-quality human …

被引用次数：247 相关文章所有 6 个版本

[PDF] aclanthology.org

trlX: A framework for large scale reinforcement learning from human feedback

A Havrilla, M Zhuravinskyi, D Phung… - Proceedings of the …, 2023 - aclanthology.org

Reinforcement learning from human feedback (RLHF) utilizes human feedback to better
align large language models with human preferences via online optimization against a …

被引用次数：16 相关文章所有 3 个版本

[PDF] arxiv.org

Secrets of rlhf in large language models part ii: Reward modeling

B Wang, R Zheng, L Chen, Y Liu, S Dou… - arXiv preprint arXiv …, 2024 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) has become a crucial technology
for aligning language models with human values and intentions, enabling models to …

被引用次数：27 相关文章所有 2 个版本

[PDF] arxiv.org

Nash learning from human feedback

R Munos, M Valko, D Calandriello, MG Azar… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm
for aligning large language models (LLMs) with human preferences. Typically, RLHF …

被引用次数：44 相关文章所有 5 个版本

[PDF] arxiv.org

Fine-tuning language models with advantage-induced policy alignment

B Zhu, H Sharma, FV Frujeri, S Dong, C Zhu… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) has emerged as a reliable approach
to aligning large language models (LLMs) to human preferences. Among the plethora of …

被引用次数：24 相关文章所有 4 个版本

[PDF] arxiv.org

Personalized language modeling from personalized human feedback

X Li, ZC Lipton, L Leqi - arXiv preprint arXiv:2402.05133, 2024 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) is the current dominating framework
to fine-tune large language models to better align with human preferences. However, the …

被引用次数：9 相关文章所有 3 个版本

[PDF] arxiv.org

Rlhf workflow: From reward modeling to online rlhf

H Dong, W Xiong, B Pang, H Wang, H Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org

We present the workflow of Online Iterative Reinforcement Learning from Human Feedback
(RLHF) in this technical report, which is widely reported to outperform its offline counterpart …

被引用次数：15 相关文章所有 2 个版本

高级搜索

QQ 群

Ultrafeedback: Boosting language models with high-quality feedback

Rrhf: Rank responses to align language models with human feedback without tears

RRHF: Rank responses to align language models with human feedback

Rlaif: Scaling reinforcement learning from human feedback with ai feedback

trlX: A framework for large scale reinforcement learning from human feedback

Secrets of rlhf in large language models part ii: Reward modeling

Nash learning from human feedback

Fine-tuning language models with advantage-induced policy alignment

Personalized language modeling from personalized human feedback

Rlhf workflow: From reward modeling to online rlhf

引用