Policy invariance under reward transformations: Theory and application to reward shaping

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

被引用次数：145 相关文章所有 3 个版本

[PDF] ieee.org

A tutorial on ultrareliable and low-latency communications in 6G: Integrating domain knowledge into deep learning

C She, C Sun, Z Gu, Y Li, C Yang… - Proceedings of the …, 2021 - ieeexplore.ieee.org

As one of the key communication scenarios in the fifth-generation and also the sixth-
generation (6G) mobile communication networks, ultrareliable and low-latency …

被引用次数：345 相关文章所有 6 个版本

[PDF] arxiv.org

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

被引用次数：317 相关文章所有 6 个版本

[PDF] arxiv.org

Eureka: Human-level reward design via coding large language models

YJ Ma, W Liang, G Wang, DA Huang, O Bastani… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have excelled as high-level semantic planners for
sequential decision-making tasks. However, harnessing them to learn complex low-level …

被引用次数：176 相关文章所有 7 个版本

[PDF] neurips.cc

Defining and characterizing reward gaming

J Skalse, N Howe… - Advances in Neural …, 2022 - proceedings.neurips.cc

We provide the first formal definition of\textbf {reward hacking}, a phenomenon where
optimizing an imperfect proxy reward function, $\mathcal {\tilde {R}} $, leads to poor …

被引用次数：177 相关文章所有 7 个版本

[PDF] arxiv.org

Vip: Towards universal visual reward and representation via value-implicit pre-training

YJ Ma, S Sodhani, D Jayaraman, O Bastani… - arXiv preprint arXiv …, 2022 - arxiv.org

Reward and representation learning are two long-standing challenges for learning an
expanding set of robot manipulation skills from sensory observations. Given the inherent …

被引用次数：199 相关文章所有 5 个版本

[PDF] springer.com

Multi-agent deep reinforcement learning: a survey

S Gronauer, K Diepold - Artificial Intelligence Review, 2022 - Springer

The advances in reinforcement learning have recorded sublime success in various domains.
Although the multi-agent domain has been overshadowed by its single-agent counterpart …

被引用次数：600 相关文章所有 8 个版本

[PDF] arxiv.org

Safe rlhf: Safe reinforcement learning from human feedback

J Dai, X Pan, R Sun, J Ji, X Xu, M Liu, Y Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

With the development of large language models (LLMs), striking a balance between the
performance and safety of AI systems has never been more critical. However, the inherent …

被引用次数：138 相关文章所有 3 个版本

[PDF] neurips.cc

Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

A Rame, G Couairon, C Dancette… - Advances in …, 2024 - proceedings.neurips.cc

Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned
on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further …

被引用次数：71 相关文章所有 7 个版本

[HTML] nih.gov

Transfer learning in deep reinforcement learning: A survey

Z Zhu, K Lin, AK Jain, J Zhou - IEEE Transactions on Pattern …, 2023 - ieeexplore.ieee.org

Reinforcement learning is a learning paradigm for solving sequential decision-making
problems. Recent years have witnessed remarkable progress in reinforcement learning …

被引用次数：602 相关文章所有 12 个版本

高级搜索

QQ 群