Model Alignment as Prospect Theoretic Optimization

文章

学术资源搜索

获得 1 条结果（用时0.02秒）

我的图书馆

Model Alignment as Prospect Theoretic Optimization

在引用文章中搜索

[PDF] arxiv.org

Data-Centric Human Preference Optimization with Rationales

HA Just, M Jin, A Sahu, H Phan, R Jia - arXiv preprint arXiv:2407.14477, 2024 - arxiv.org

Reinforcement learning from human feedback plays a crucial role in aligning language
models towards human preferences, traditionally represented through comparisons …

高级搜索

QQ 群

Model Alignment as Prospect Theoretic Optimization

Data-Centric Human Preference Optimization with Rationales

引用