相关文章- 学术资源搜索

SURF: Semi-supervised reward learning with data augmentation for feedback-efficient preference-based reinforcement learning

J Park, Y Seo, J Shin, H Lee, P Abbeel… - arXiv preprint arXiv …, 2022 - arxiv.org

Preference-based reinforcement learning (RL) has shown potential for teaching agents to
perform the target tasks without a costly, pre-defined reward function by learning the reward …

被引用次数：69 相关文章所有 6 个版本

[PDF] neurips.cc

Meta-reward-net: Implicitly differentiable reward learning for preference-based reinforcement learning

R Liu, F Bai, Y Du, Y Yang - Advances in Neural …, 2022 - proceedings.neurips.cc

Abstract Setting up a well-designed reward function has been challenging for many
reinforcement learning applications. Preference-based reinforcement learning (PbRL) …

被引用次数：34 相关文章所有 6 个版本

[PDF] arxiv.org

B-pref: Benchmarking preference-based reinforcement learning

K Lee, L Smith, A Dragan, P Abbeel - arXiv preprint arXiv:2111.03026, 2021 - arxiv.org

Reinforcement learning (RL) requires access to a reward function that incentivizes the right
behavior, but these are notoriously hard to specify for complex tasks. Preference-based RL …

被引用次数：91 相关文章所有 7 个版本

[PDF] arxiv.org

Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training

K Lee, L Smith, P Abbeel - arXiv preprint arXiv:2106.05091, 2021 - arxiv.org

Conveying complex objectives to reinforcement learning (RL) agents can often be difficult,
involving meticulous design of reward functions that are sufficiently informative yet easy …

被引用次数：227 相关文章所有 6 个版本

[PDF] arxiv.org

Weak human preference supervision for deep reinforcement learning

Z Cao, KC Wong, CT Lin - IEEE Transactions on Neural …, 2021 - ieeexplore.ieee.org

The current reward learning from human preferences could be used to resolve complex
reinforcement learning (RL) tasks without access to a reward function by defining a single …

被引用次数：38 相关文章所有 8 个版本

[PDF] arxiv.org

Reward uncertainty for exploration in preference-based reinforcement learning

X Liang, K Shu, K Lee, P Abbeel - arXiv preprint arXiv:2205.12401, 2022 - arxiv.org

Conveying complex objectives to reinforcement learning (RL) agents often requires
meticulous reward engineering. Preference-based RL methods are able to learn a more …

被引用次数：57 相关文章所有 5 个版本

[PDF] arxiv.org

Benchmarks and algorithms for offline preference-based reward learning

D Shin, AD Dragan, DS Brown - arXiv preprint arXiv:2301.01392, 2023 - arxiv.org

Learning a reward function from human preferences is challenging as it typically requires
having a high-fidelity simulator or using expensive and potentially unsafe actual physical …

被引用次数：46 相关文章所有 4 个版本

[PDF] arxiv.org

Text2reward: Automated dense reward function generation for reinforcement learning

T Xie, S Zhao, CH Wu, Y Liu, Q Luo, V Zhong… - arXiv preprint arXiv …, 2023 - arxiv.org

Designing reward functions is a longstanding challenge in reinforcement learning (RL); it
requires specialized knowledge or domain data, leading to high costs for development. To …

被引用次数：18 相关文章所有 2 个版本

[PDF] arxiv.org

Reinforcement learning from diverse human preferences

W Xue, B An, S Yan, Z Xu - arXiv preprint arXiv:2301.11774, 2023 - arxiv.org

The complexity of designing reward functions has been a major obstacle to the wide
application of deep reinforcement learning (RL) techniques. Describing an agent's desired …

被引用次数：17 相关文章所有 3 个版本

[PDF] neurips.cc

Direct preference-based policy optimization without reward modeling

G An, J Lee, X Zuo, N Kosaka… - Advances in Neural …, 2023 - proceedings.neurips.cc

Preference-based reinforcement learning (PbRL) is an approach that enables RL agents to
learn from preference, which is particularly useful when formulating a reward function is …

被引用次数：11 相关文章所有 5 个版本

高级搜索

QQ 群

SURF: Semi-supervised reward learning with data augmentation for feedback-efficient preference-based reinforcement learning

Meta-reward-net: Implicitly differentiable reward learning for preference-based reinforcement learning

B-pref: Benchmarking preference-based reinforcement learning

Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training

Weak human preference supervision for deep reinforcement learning

Reward uncertainty for exploration in preference-based reinforcement learning

Benchmarks and algorithms for offline preference-based reward learning

Text2reward: Automated dense reward function generation for reinforcement learning

Reinforcement learning from diverse human preferences

Direct preference-based policy optimization without reward modeling

相关搜索

引用