Reward specification is a notoriously difficult problem in reinforcement learning, requiring extensive expert supervision to design robust reward functions. Imitation learning (IL) …
J Hejna, D Sadigh - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Reward functions are difficult to design and often hard to align with human intent. Preference- based Reinforcement Learning (RL) algorithms address these problems by learning reward …
Learning from active human involvement enables the human subject to actively intervene and demonstrate to the AI agent during training. The interaction and corrective feedback …
Customizing robotic behaviors to be aligned with diverse human preferences is an underexplored challenge in the field of embodied AI. In this paper we present Promptable …
E Bıyık, N Huynh, MJ Kochenderfer… - … Journal of Robotics …, 2024 - journals.sagepub.com
Designing reward functions is a difficult task in AI and robotics. The complex task of directly specifying all the desirable behaviors a robot needs to optimize often proves challenging for …
Language models influence the external world: they query APIs that read and write to web pages, generate content that shapes human behavior, and run system commands as …
Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique that aligns language models closely with human-centric values. The initial phase of RLHF involves …
Reward hacking occurs when an agent performs very well with respect to a" proxy" reward function (which may be hand-specified or learned), but poorly with respect to the unknown …
We consider algorithms for learning reward functions from human preferences over pairs of trajectory segments, as used in reinforcement learning from human feedback (RLHF). Most …