Meta-reward-net: Implicitly differentiable reward learning for preference-based reinforcement learning

R Liu, F Bai, Y Du, Y Yang - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract Setting up a well-designed reward function has been challenging for many
reinforcement learning applications. Preference-based reinforcement learning (PbRL) …

SURF: Semi-supervised reward learning with data augmentation for feedback-efficient preference-based reinforcement learning

J Park, Y Seo, J Shin, H Lee, P Abbeel… - arXiv preprint arXiv …, 2022 - arxiv.org
Preference-based reinforcement learning (RL) has shown potential for teaching agents to
perform the target tasks without a costly, pre-defined reward function by learning the reward …

Direct preference-based policy optimization without reward modeling

G An, J Lee, X Zuo, N Kosaka… - Advances in Neural …, 2023 - proceedings.neurips.cc
Preference-based reinforcement learning (PbRL) is an approach that enables RL agents to
learn from preference, which is particularly useful when formulating a reward function is …

A survey of preference-based reinforcement learning methods

C Wirth, R Akrour, G Neumann, J Fürnkranz - Journal of Machine Learning …, 2017 - jmlr.org
Reinforcement learning (RL) techniques optimize the accumulated long-term reward of a
suitably chosen reward function. However, designing such a reward function often requires …

Inverse preference learning: Preference-based rl without a reward function

J Hejna, D Sadigh - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Reward functions are difficult to design and often hard to align with human intent. Preference-
based Reinforcement Learning (RL) algorithms address these problems by learning reward …

B-pref: Benchmarking preference-based reinforcement learning

K Lee, L Smith, A Dragan, P Abbeel - arXiv preprint arXiv:2111.03026, 2021 - arxiv.org
Reinforcement learning (RL) requires access to a reward function that incentivizes the right
behavior, but these are notoriously hard to specify for complex tasks. Preference-based RL …

Reward uncertainty for exploration in preference-based reinforcement learning

X Liang, K Shu, K Lee, P Abbeel - arXiv preprint arXiv:2205.12401, 2022 - arxiv.org
Conveying complex objectives to reinforcement learning (RL) agents often requires
meticulous reward engineering. Preference-based RL methods are able to learn a more …

Reinforcement learning from diverse human preferences

W Xue, B An, S Yan, Z Xu - arXiv preprint arXiv:2301.11774, 2023 - arxiv.org
The complexity of designing reward functions has been a major obstacle to the wide
application of deep reinforcement learning (RL) techniques. Describing an agent's desired …

Weak human preference supervision for deep reinforcement learning

Z Cao, KC Wong, CT Lin - IEEE Transactions on Neural …, 2021 - ieeexplore.ieee.org
The current reward learning from human preferences could be used to resolve complex
reinforcement learning (RL) tasks without access to a reward function by defining a single …

Few-shot preference learning for human-in-the-loop rl

DJ Hejna III, D Sadigh - Conference on Robot Learning, 2023 - proceedings.mlr.press
While reinforcement learning (RL) has become a more popular approach for robotics,
designing sufficiently informative reward functions for complex tasks has proven to be …