Scaling data-driven robotics with reward sketching and batch reinforcement learning

RF Prudencio, MROA Maximo… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

With the widespread adoption of deep learning, reinforcement learning (RL) has
experienced a dramatic increase in popularity, scaling to previously intractable problems …

被引用次数：248 相关文章所有 9 个版本

[PDF] royalsocietypublishing.org

Inductive biases for deep learning of higher-level cognition

A Goyal, Y Bengio - Proceedings of the Royal Society A, 2022 - royalsocietypublishing.org

A fascinating hypothesis is that human and animal intelligence could be explained by a few
principles (rather than an encyclopaedic list of heuristics). If that hypothesis was correct, we …

被引用次数：349 相关文章所有 5 个版本

[PDF] arxiv.org

A generalist agent

S Reed, K Zolna, E Parisotto, SG Colmenarejo… - arXiv preprint arXiv …, 2022 - arxiv.org

Inspired by progress in large-scale language modeling, we apply a similar approach
towards building a single generalist agent beyond the realm of text outputs. The agent …

被引用次数：810 相关文章所有 4 个版本

[PDF] arxiv.org

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

被引用次数：280 相关文章所有 6 个版本

[PDF] mlr.press

Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions

Y Chebotar, Q Vuong, K Hausman… - … on Robot Learning, 2023 - proceedings.mlr.press

In this work, we present a scalable reinforcement learning method for training multi-task
policies from large offline datasets that can leverage both human demonstrations and …

被引用次数：50 相关文章所有 6 个版本

[PDF] mlr.press

Bridgedata v2: A dataset for robot learning at scale

HR Walke, K Black, TZ Zhao, Q Vuong… - … on Robot Learning, 2023 - proceedings.mlr.press

We introduce BridgeData V2, a large and diverse dataset of robotic manipulation behaviors
designed to facilitate research in scalable robot learning. BridgeData V2 contains 53,896 …

被引用次数：48 相关文章所有 4 个版本

[PDF] mlr.press

Efficient online reinforcement learning with offline data

PJ Ball, L Smith, I Kostrikov… - … Conference on Machine …, 2023 - proceedings.mlr.press

Sample efficiency and exploration remain major challenges in online reinforcement learning
(RL). A powerful approach that can be applied to address these issues is the inclusion of …

被引用次数：87 相关文章所有 6 个版本

[PDF] arxiv.org

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

被引用次数：128 相关文章所有 3 个版本

[PDF] arxiv.org

Conditional object-centric learning from video

T Kipf, GF Elsayed, A Mahendran, A Stone… - arXiv preprint arXiv …, 2021 - arxiv.org

Object-centric representations are a promising path toward more systematic generalization
by providing flexible abstractions upon which compositional world models can be built …

被引用次数：183 相关文章所有 3 个版本

[PDF] neurips.cc

Critic regularized regression

Z Wang, A Novikov, K Zolna, JS Merel… - Advances in …, 2020 - proceedings.neurips.cc

Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy
optimization from large pre-recorded datasets without online environment interaction. It …

被引用次数：318 相关文章所有 9 个版本

高级搜索

QQ 群