相关文章- 学术资源搜索

Direct preference-based policy optimization without reward modeling

G An, J Lee, X Zuo, N Kosaka… - Advances in Neural …, 2023 - proceedings.neurips.cc

Preference-based reinforcement learning (PbRL) is an approach that enables RL agents to
learn from preference, which is particularly useful when formulating a reward function is …

被引用次数：11 相关文章所有 5 个版本

Advances in preference-based reinforcement learning: A review

Y Abdelkareem, S Shehata… - 2022 IEEE international …, 2022 - ieeexplore.ieee.org

Reinforcement Learning (RL) algorithms suffer from the dependency on accurately
engineered reward functions to properly guide the learning agents to do the required tasks …

被引用次数：10 相关文章所有 2 个版本

[PDF] jmlr.org

A survey of preference-based reinforcement learning methods

C Wirth, R Akrour, G Neumann, J Fürnkranz - Journal of Machine Learning …, 2017 - jmlr.org

Reinforcement learning (RL) techniques optimize the accumulated long-term reward of a
suitably chosen reward function. However, designing such a reward function often requires …

被引用次数：374 相关文章所有 10 个版本

[PDF] neurips.cc

Inverse preference learning: Preference-based rl without a reward function

J Hejna, D Sadigh - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Reward functions are difficult to design and often hard to align with human intent. Preference-
based Reinforcement Learning (RL) algorithms address these problems by learning reward …

被引用次数：28 相关文章所有 9 个版本

[PDF] arxiv.org

Query-policy misalignment in preference-based reinforcement learning

X Hu, J Li, X Zhan, QS Jia, YQ Zhang - arXiv preprint arXiv:2305.17400, 2023 - arxiv.org

Preference-based reinforcement learning (PbRL) provides a natural way to align RL agents'
behavior with human desired outcomes, but is often restrained by costly human feedback …

被引用次数：8 相关文章所有 5 个版本

[PDF] neurips.cc

Meta-reward-net: Implicitly differentiable reward learning for preference-based reinforcement learning

R Liu, F Bai, Y Du, Y Yang - Advances in Neural …, 2022 - proceedings.neurips.cc

Abstract Setting up a well-designed reward function has been challenging for many
reinforcement learning applications. Preference-based reinforcement learning (PbRL) …

被引用次数：34 相关文章所有 6 个版本

[PDF] arxiv.org

Reward uncertainty for exploration in preference-based reinforcement learning

X Liang, K Shu, K Lee, P Abbeel - arXiv preprint arXiv:2205.12401, 2022 - arxiv.org

Conveying complex objectives to reinforcement learning (RL) agents often requires
meticulous reward engineering. Preference-based RL methods are able to learn a more …

被引用次数：57 相关文章所有 5 个版本

[PDF] arxiv.org

Provable reward-agnostic preference-based reinforcement learning

W Zhan, M Uehara, W Sun, JD Lee - arXiv preprint arXiv:2305.18505, 2023 - arxiv.org

Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent
learns to optimize a task using pair-wise preference-based feedback over trajectories, rather …

被引用次数：5 相关文章所有 3 个版本

[HTML] springer.com

[HTML][HTML] Preference-based reinforcement learning: a formal framework and a policy iteration algorithm

J Fürnkranz, E Hüllermeier, W Cheng, SH Park - Machine learning, 2012 - Springer

This paper makes a first step toward the integration of two subfields of machine learning,
namely preference learning and reinforcement learning (RL). An important motivation for a …

被引用次数：176 相关文章所有 17 个版本

[PDF] neurips.cc

Preference-based reinforcement learning with finite-time guarantees

Y Xu, R Wang, L Yang, A Singh… - Advances in Neural …, 2020 - proceedings.neurips.cc

Abstract Preference-based Reinforcement Learning (PbRL) replaces reward values in
traditional reinforcement learning by preferences to better elicit human opinion on the target …

被引用次数：57 相关文章所有 7 个版本

高级搜索

QQ 群

Direct preference-based policy optimization without reward modeling

Advances in preference-based reinforcement learning: A review

A survey of preference-based reinforcement learning methods

Inverse preference learning: Preference-based rl without a reward function

Query-policy misalignment in preference-based reinforcement learning

Meta-reward-net: Implicitly differentiable reward learning for preference-based reinforcement learning

Reward uncertainty for exploration in preference-based reinforcement learning

Provable reward-agnostic preference-based reinforcement learning

[HTML][HTML] Preference-based reinforcement learning: a formal framework and a policy iteration algorithm

Preference-based reinforcement learning with finite-time guarantees

相关搜索

引用