A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

Promptable behaviors: Personalizing multi-objective rewards from human preferences

M Hwang, L Weihs, C Park, K Lee… - Proceedings of the …, 2024 - openaccess.thecvf.com
Customizing robotic behaviors to be aligned with diverse human preferences is an
underexplored challenge in the field of embodied AI. In this paper we present Promptable …

Sequential preference ranking for efficient reinforcement learning from human feedback

M Hwang, G Lee, H Kee, CW Kim… - Advances in Neural …, 2024 - proceedings.neurips.cc
Reinforcement learning from human feedback (RLHF) alleviates the problem of designing a
task-specific reward function in reinforcement learning by learning it from human preference …

Pecan: Leveraging policy ensemble for context-aware zero-shot human-ai coordination

X Lou, J Guo, J Zhang, J Wang, K Huang… - arXiv preprint arXiv …, 2023 - arxiv.org
Zero-shot human-AI coordination holds the promise of collaborating with humans without
human data. Prevailing methods try to train the ego agent with a population of partners via …

[HTML][HTML] A human-centered safe robot reinforcement learning framework with interactive behaviors

S Gu, A Kshirsagar, Y Du, G Chen, J Peters… - Frontiers in …, 2023 - frontiersin.org
Deployment of Reinforcement Learning (RL) algorithms for robotics applications in the real
world requires ensuring the safety of the robot and its environment. Safe Robot RL (SRRL) is …

Cooperative multi-agent learning in a complex world: challenges and solutions

Y Du - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
Over the past few years, artificial intelligence (AI) has achieved great success in a variety of
applications, such as image classification and recommendation systems. This success has …

Prefclm: Enhancing preference-based reinforcement learning with crowdsourced large language models

R Wang, D Zhao, Z Yuan, I Obi, BC Min - arXiv preprint arXiv:2407.08213, 2024 - arxiv.org
Preference-based reinforcement learning (PbRL) is emerging as a promising approach to
teaching robots through human comparative feedback, sidestepping the need for complex …

[PDF][PDF] Foresight distribution adjustment for off-policy reinforcement learning

R Chen, XH Liu, TS Liu, S Jiang, F Xu… - Proceedings of the 23rd …, 2024 - ifaamas.org
Off-policy reinforcement learning algorithms maintain a replay buffer to utilize samples
obtained from earlier policies. The sampling strategy that prioritizes certain data in a buffer to …

Explore 3d dance generation via reward model from automatically-ranked demonstrations

Z Wang, H Zhuang, L Li, Y Zhang, J Zhong… - Proceedings of the …, 2024 - ojs.aaai.org
This paper presents an Exploratory 3D Dance generation framework, E3D2, designed to
address the exploration capability deficiency in existing music-conditioned 3D dance …

Autopilot controller of fixed-wing planes based on curriculum reinforcement learning scheduled by adaptive learning curve

L Li, X Zhang, C Qian, R Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
In this paper, we present a novel curriculum reinforcement learning method that can
automatically generate a high-performance autopilot controller for a 6-degree-of-freedom (6 …