Offline reinforcement learning with implicit q-learning

RF Prudencio, MROA Maximo… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

With the widespread adoption of deep learning, reinforcement learning (RL) has
experienced a dramatic increase in popularity, scaling to previously intractable problems …

被引用次数：230 相关文章所有 9 个版本

[PDF] arxiv.org

Large language models for robotics: A survey

F Zeng, W Gan, Y Wang, N Liu, PS Yu - arXiv preprint arXiv:2311.07226, 2023 - arxiv.org

The human ability to learn, generalize, and control complex manipulation tasks through multi-
modality feedback suggests a unique capability, which we refer to as dexterity intelligence …

被引用次数：49 相关文章所有 3 个版本

[PDF] arxiv.org

Planning with diffusion for flexible behavior synthesis

M Janner, Y Du, JB Tenenbaum, S Levine - arXiv preprint arXiv …, 2022 - arxiv.org

Model-based reinforcement learning methods often use learning only for the purpose of
estimating an approximate dynamics model, offloading the rest of the decision-making work …

被引用次数：374 相关文章所有 4 个版本

[PDF] mlr.press

Principled reinforcement learning with human feedback from pairwise or k-wise comparisons

B Zhu, M Jordan, J Jiao - International Conference on …, 2023 - proceedings.mlr.press

We provide a theoretical framework for Reinforcement Learning with Human Feedback
(RLHF). We show that when the underlying true reward is linear, under both Bradley-Terry …

被引用次数：119 相关文章所有 8 个版本

[PDF] arxiv.org

Is conditional generative modeling all you need for decision-making?

A Ajay, Y Du, A Gupta, J Tenenbaum… - arXiv preprint arXiv …, 2022 - arxiv.org

Recent improvements in conditional generative modeling have made it possible to generate
high-quality images from language descriptions alone. We investigate whether these …

被引用次数：234 相关文章所有 4 个版本

[PDF] arxiv.org

Diffusion policies as an expressive policy class for offline reinforcement learning

Z Wang, JJ Hunt, M Zhou - arXiv preprint arXiv:2208.06193, 2022 - arxiv.org

Offline reinforcement learning (RL), which aims to learn an optimal policy using a previously
collected static dataset, is an important paradigm of RL. Standard RL methods often perform …

被引用次数：210 相关文章所有 6 个版本

[PDF] neurips.cc

Offline reinforcement learning as one big sequence modeling problem

M Janner, Q Li, S Levine - Advances in neural information …, 2021 - proceedings.neurips.cc

Reinforcement learning (RL) is typically viewed as the problem of estimating single-step
policies (for model-free RL) or single-step models (for model-based RL), leveraging the …

被引用次数：633 相关文章所有 9 个版本

[PDF] mlr.press

Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions

Y Chebotar, Q Vuong, K Hausman… - … on Robot Learning, 2023 - proceedings.mlr.press

In this work, we present a scalable reinforcement learning method for training multi-task
policies from large offline datasets that can leverage both human demonstrations and …

被引用次数：45 相关文章所有 6 个版本

[PDF] mlr.press

Bridgedata v2: A dataset for robot learning at scale

HR Walke, K Black, TZ Zhao, Q Vuong… - … on Robot Learning, 2023 - proceedings.mlr.press

We introduce BridgeData V2, a large and diverse dataset of robotic manipulation behaviors
designed to facilitate research in scalable robot learning. BridgeData V2 contains 53,896 …

被引用次数：37 相关文章所有 4 个版本

[PDF] mlr.press

Efficient online reinforcement learning with offline data

PJ Ball, L Smith, I Kostrikov… - … Conference on Machine …, 2023 - proceedings.mlr.press

Sample efficiency and exploration remain major challenges in online reinforcement learning
(RL). A powerful approach that can be applied to address these issues is the inclusion of …

被引用次数：72 相关文章所有 6 个版本

高级搜索

QQ 群