Offline reinforcement learning with fisher divergence critic regularization

RF Prudencio, MROA Maximo… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

With the widespread adoption of deep learning, reinforcement learning (RL) has
experienced a dramatic increase in popularity, scaling to previously intractable problems …

被引用次数：322 相关文章所有 9 个版本

[PDF] arxiv.org

Is conditional generative modeling all you need for decision-making?

A Ajay, Y Du, A Gupta, J Tenenbaum… - arXiv preprint arXiv …, 2022 - arxiv.org

Recent improvements in conditional generative modeling have made it possible to generate
high-quality images from language descriptions alone. We investigate whether these …

被引用次数：332 相关文章所有 4 个版本

[PDF] arxiv.org

Offline reinforcement learning with implicit q-learning

I Kostrikov, A Nair, S Levine - arXiv preprint arXiv:2110.06169, 2021 - arxiv.org

Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that
improves over the behavior policy that collected the dataset, while at the same time …

被引用次数：820 相关文章所有 6 个版本

[PDF] mlr.press

Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions

Y Chebotar, Q Vuong, K Hausman… - … on Robot Learning, 2023 - proceedings.mlr.press

In this work, we present a scalable reinforcement learning method for training multi-task
policies from large offline datasets that can leverage both human demonstrations and …

被引用次数：76 相关文章所有 6 个版本

[PDF] arxiv.org

Diffusion policies as an expressive policy class for offline reinforcement learning

Z Wang, JJ Hunt, M Zhou - arXiv preprint arXiv:2208.06193, 2022 - arxiv.org

Offline reinforcement learning (RL), which aims to learn an optimal policy using a previously
collected static dataset, is an important paradigm of RL. Standard RL methods often perform …

被引用次数：298 相关文章所有 6 个版本

[PDF] neurips.cc

A minimalist approach to offline reinforcement learning

S Fujimoto, SS Gu - Advances in neural information …, 2021 - proceedings.neurips.cc

Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data.
Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms …

被引用次数：823 相关文章所有 6 个版本

[PDF] neurips.cc

Behavior Transformers: Cloning modes with one stone

NM Shafiullah, Z Cui… - Advances in neural …, 2022 - proceedings.neurips.cc

While behavior learning has made impressive progress in recent times, it lags behind
computer vision and natural language processing due to its inability to leverage large …

被引用次数：179 相关文章所有 6 个版本

[PDF] neurips.cc

Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning

M Nakamoto, S Zhai, A Singh… - Advances in …, 2024 - proceedings.neurips.cc

A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization
from existing datasets followed by fast online fine-tuning with limited interaction. However …

被引用次数：95 相关文章所有 7 个版本

[PDF] mlr.press

Implicit behavioral cloning

P Florence, C Lynch, A Zeng… - … on Robot Learning, 2022 - proceedings.mlr.press

We find that across a wide range of robot policy learning scenarios, treating supervised
policy learning with an implicit model generally performs better, on average, than commonly …

被引用次数：377 相关文章所有 9 个版本

[PDF] neurips.cc

Combo: Conservative offline model-based policy optimization

T Yu, A Kumar, R Rafailov… - Advances in neural …, 2021 - proceedings.neurips.cc

Abstract Model-based reinforcement learning (RL) algorithms, which learn a dynamics
model from logged experience and perform conservative planning under the learned model …

被引用次数：443 相关文章所有 7 个版本

高级搜索

QQ 群