Active preference-based learning of reward functions

R Rafailov, A Sharma, E Mitchell… - Advances in …, 2024 - proceedings.neurips.cc

While large-scale unsupervised language models (LMs) learn broad world knowledge and
some reasoning skills, achieving precise control of their behavior is difficult due to the …

被引用次数：791 相关文章所有 9 个版本

[PDF] neurips.cc

Roboclip: One demonstration is enough to learn robot policies

S Sontakke, J Zhang, S Arnold… - Advances in …, 2024 - proceedings.neurips.cc

Reward specification is a notoriously difficult problem in reinforcement learning, requiring
extensive expert supervision to design robust reward functions. Imitation learning (IL) …

被引用次数：18 相关文章所有 7 个版本

[PDF] neurips.cc

Inverse preference learning: Preference-based rl without a reward function

J Hejna, D Sadigh - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Reward functions are difficult to design and often hard to align with human intent. Preference-
based Reinforcement Learning (RL) algorithms address these problems by learning reward …

被引用次数：23 相关文章所有 9 个版本

[PDF] neurips.cc

Learning from active human involvement through proxy value propagation

ZM Peng, W Mo, C Duan, Q Li… - Advances in neural …, 2024 - proceedings.neurips.cc

Learning from active human involvement enables the human subject to actively intervene
and demonstrate to the AI agent during training. The interaction and corrective feedback …

被引用次数：4 相关文章所有 6 个版本

[PDF] thecvf.com

Promptable behaviors: Personalizing multi-objective rewards from human preferences

M Hwang, L Weihs, C Park, K Lee… - Proceedings of the …, 2024 - openaccess.thecvf.com

Customizing robotic behaviors to be aligned with diverse human preferences is an
underexplored challenge in the field of embodied AI. In this paper we present Promptable …

被引用次数：4 相关文章所有 4 个版本

[PDF] researchgate.net

Active preference-based Gaussian process regression for reward learning and optimization

E Bıyık, N Huynh, MJ Kochenderfer… - … Journal of Robotics …, 2024 - journals.sagepub.com

Designing reward functions is a difficult task in AI and robotics. The complex task of directly
specifying all the desirable behaviors a robot needs to optimize often proves challenging for …

被引用次数：7 相关文章所有 6 个版本

[PDF] arxiv.org

Feedback loops with language models drive in-context reward hacking

A Pan, E Jones, M Jagadeesan, J Steinhardt - arXiv preprint arXiv …, 2024 - arxiv.org

Language models influence the external world: they query APIs that read and write to web
pages, generate content that shapes human behavior, and run system commands as …

被引用次数：8 相关文章所有 3 个版本

[PDF] arxiv.org

Iterative data smoothing: Mitigating reward overfitting and overoptimization in rlhf

B Zhu, MI Jordan, J Jiao - arXiv preprint arXiv:2401.16335, 2024 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique that aligns
language models closely with human-centric values. The initial phase of RLHF involves …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Preventing reward hacking with occupancy measure regularization

C Laidlaw, S Singhal, A Dragan - arXiv preprint arXiv:2403.03185, 2024 - arxiv.org

Reward hacking occurs when an agent performs very well with respect to a" proxy" reward
function (which may be hand-specified or learned), but poorly with respect to the unknown …

被引用次数：8 相关文章所有 4 个版本

[PDF] aaai.org

Learning optimal advantage from preferences and mistaking it for reward

WB Knox, S Hatgis-Kessell, SO Adalgeirsson… - Proceedings of the …, 2024 - ojs.aaai.org

We consider algorithms for learning reward functions from human preferences over pairs of
trajectory segments, as used in reinforcement learning from human feedback (RLHF). Most …

被引用次数：3 相关文章所有 7 个版本

高级搜索

QQ 群