Critic regularized regression

RF Prudencio, MROA Maximo… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

With the widespread adoption of deep learning, reinforcement learning (RL) has
experienced a dramatic increase in popularity, scaling to previously intractable problems …

被引用次数：264 相关文章所有 9 个版本

[PDF] royalsocietypublishing.org Full View

Learning robotic navigation from experience: principles, methods and recent results

S Levine, D Shah - … Transactions of the Royal Society B, 2023 - royalsocietypublishing.org

Navigation is one of the most heavily studied problems in robotics and is conventionally
approached as a geometric mapping and planning problem. However, real-world navigation …

被引用次数：17 相关文章所有 8 个版本

[PDF] arxiv.org

A generalist agent

S Reed, K Zolna, E Parisotto, SG Colmenarejo… - arXiv preprint arXiv …, 2022 - arxiv.org

Inspired by progress in large-scale language modeling, we apply a similar approach
towards building a single generalist agent beyond the realm of text outputs. The agent …

被引用次数：830 相关文章所有 4 个版本

[PDF] arxiv.org

Offline reinforcement learning with implicit q-learning

I Kostrikov, A Nair, S Levine - arXiv preprint arXiv:2110.06169, 2021 - arxiv.org

Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that
improves over the behavior policy that collected the dataset, while at the same time …

被引用次数：695 相关文章所有 6 个版本

[PDF] neurips.cc

A minimalist approach to offline reinforcement learning

S Fujimoto, SS Gu - Advances in neural information …, 2021 - proceedings.neurips.cc

Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data.
Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms …

被引用次数：723 相关文章所有 6 个版本

[PDF] mlr.press

Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions

Y Chebotar, Q Vuong, K Hausman… - … on Robot Learning, 2023 - proceedings.mlr.press

In this work, we present a scalable reinforcement learning method for training multi-task
policies from large offline datasets that can leverage both human demonstrations and …

被引用次数：55 相关文章所有 6 个版本

[PDF] mlr.press

Is pessimism provably efficient for offline rl?

Y Jin, Z Yang, Z Wang - International Conference on …, 2021 - proceedings.mlr.press

We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …

被引用次数：405 相关文章所有 7 个版本

[PDF] openreview.net

What matters in learning from offline human demonstrations for robot manipulation

A Mandlekar, D Xu, J Wong, S Nasiriany… - arXiv preprint arXiv …, 2021 - arxiv.org

Imitating human demonstrations is a promising approach to endow robots with various
manipulation capabilities. While recent advances have been made in imitation learning and …

被引用次数：335 相关文章所有 4 个版本

[PDF] neurips.cc

Mildly conservative q-learning for offline reinforcement learning

J Lyu, X Ma, X Li, Z Lu - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Offline reinforcement learning (RL) defines the task of learning from a static logged dataset
without continually interacting with the environment. The distribution shift between the …

被引用次数：90 相关文章所有 5 个版本

[PDF] arxiv.org

Idql: Implicit q-learning as an actor-critic method with diffusion policies

P Hansen-Estruch, I Kostrikov, M Janner… - arXiv preprint arXiv …, 2023 - arxiv.org

Effective offline RL methods require properly handling out-of-distribution actions. Implicit Q-
learning (IQL) addresses this by training a Q-function using only dataset actions through a …

被引用次数：81 相关文章所有 4 个版本

高级搜索

QQ 群

A survey on offline reinforcement learning: Taxonomy, review, and open problems

Learning robotic navigation from experience: principles, methods and recent results

A generalist agent

Offline reinforcement learning with implicit q-learning

A minimalist approach to offline reinforcement learning

Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions

Is pessimism provably efficient for offline rl?

What matters in learning from offline human demonstrations for robot manipulation

Mildly conservative q-learning for offline reinforcement learning

Idql: Implicit q-learning as an actor-critic method with diffusion policies

引用