Scalar reward is not enough: A response to silver, singh, precup and sutton (2021)

S Casper, X Davies, C Shi, TK Gilbert… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

被引用次数：414 相关文章所有 6 个版本

[PDF] arxiv.org

A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

被引用次数：108 相关文章所有 4 个版本

[PDF] mlr.press

Optimistic linear support and successor features as a basis for optimal policy transfer

LN Alegre, A Bazzan… - … conference on machine …, 2022 - proceedings.mlr.press

In many real-world applications, reinforcement learning (RL) agents might have to solve
multiple tasks, each one typically modeled via a reward function. If reward functions are …

被引用次数：42 相关文章所有 7 个版本

[PDF] google.com

Autonomous underwater manipulation: Current trends in dynamics, control, planning, perception, and future directions

E Morgan, I Carlucho, W Ard, C Barbalata - Current Robotics Reports, 2022 - Springer

Abstract Purpose of Review Research in underwater manipulation has mostly focused on
solving individual parts of the manipulation challenge; however, we believe a systemic …

被引用次数：10 相关文章所有 2 个版本

[PDF] springer.com

Explainable reinforcement learning for broad-xai: a conceptual framework and survey

R Dazeley, P Vamplew, F Cruz - Neural Computing and Applications, 2023 - Springer

Broad-XAI moves away from interpreting individual decisions based on a single datum and
aims to provide integrated explanations from multiple machine learning algorithms into a …

被引用次数：65 相关文章所有 9 个版本

[PDF] arxiv.org

Beyond reward: Offline preference-guided policy optimization

Y Kang, D Shi, J Liu, L He, D Wang - arXiv preprint arXiv:2305.16217, 2023 - arxiv.org

This study focuses on the topic of offline preference-based reinforcement learning (PbRL), a
variant of conventional reinforcement learning that dispenses with the need for online …

被引用次数：30 相关文章所有 7 个版本

[PDF] arxiv.org

On the link between conscious function and general intelligence in humans and machines

A Juliani, K Arulkumaran, S Sasai, R Kanai - arXiv preprint arXiv …, 2022 - arxiv.org

In popular media, there is often a connection drawn between the advent of awareness in
artificial agents and those same agents simultaneously achieving human or superhuman …

被引用次数：34 相关文章所有 4 个版本

[PDF] ieee.org

Comprehensive overview of reward engineering and shaping in advancing reinforcement learning applications

S Ibrahim, M Mostafa, A Jnadi, H Salloum… - IEEE …, 2024 - ieeexplore.ieee.org

Reinforcement Learning (RL) seeks to develop systems capable of autonomous decision-
making by learning through interaction with their environment. Central to this process are …

被引用次数：1 相关文章所有 5 个版本

[PDF] mlr.press

On the limitations of Markovian rewards to express multi-objective, risk-sensitive, and modal tasks

J Skalse, A Abate - Uncertainty in Artificial Intelligence, 2023 - proceedings.mlr.press

In this paper, we study the expressivity of scalar, Markovian reward functions in
Reinforcement Learning (RL), and identify several limitations to what they can express …

被引用次数：12 相关文章所有 8 个版本

[PDF] umass.edu

[PDF][PDF] MO-Gym: A library of multi-objective reinforcement learning environments

LN Alegre, F Felten, EG Talbi, G Danoy… - Proceedings of the …, 2022 - people.cs.umass.edu

We introduce MO-Gym, an extensible library containing a diverse set of multi-objective
reinforcement learning environments. It introduces a standardized API that facilitates …

被引用次数：32 相关文章所有 5 个版本

高级搜索

QQ 群