所有版本 - 学术资源搜索

文章

学术资源搜索

获得 2 条结果（用时0.02秒）

Rrhf: Rank responses to align language models with human feedback without tears

Z Yuan, H Yuan, C Tan, W Wang, S Huang… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large
language models with human preferences, significantly enhancing the quality of interactions …

被引用次数：176 相关文章

RRHF: Rank Responses to Align Language Models with Human Feedback without tears

Z Yuan, H Yuan, C Tan, W Wang, S Huang… - arXiv e …, 2023 - ui.adsabs.harvard.edu

Abstract Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of
large language models with human preferences, significantly enhancing the quality of …

高级搜索

QQ 群

Rrhf: Rank responses to align language models with human feedback without tears

RRHF: Rank Responses to Align Language Models with Human Feedback without tears

引用