Advantage-weighted regression: Simple and scalable off-policy reinforcement learning

RF Prudencio, MROA Maximo… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

With the widespread adoption of deep learning, reinforcement learning (RL) has
experienced a dramatic increase in popularity, scaling to previously intractable problems …

被引用次数：220 相关文章所有 9 个版本

[PDF] royalsocietypublishing.org Full View

Learning robotic navigation from experience: principles, methods and recent results

S Levine, D Shah - … Transactions of the Royal Society B, 2023 - royalsocietypublishing.org

Navigation is one of the most heavily studied problems in robotics and is conventionally
approached as a geometric mapping and planning problem. However, real-world navigation …

被引用次数：16 相关文章所有 8 个版本

[PDF] arxiv.org

Training diffusion models with reinforcement learning

K Black, M Janner, Y Du, I Kostrikov… - arXiv preprint arXiv …, 2023 - arxiv.org

Diffusion models are a class of flexible generative models trained with an approximation to
the log-likelihood objective. However, most use cases of diffusion models are not concerned …

被引用次数：97 相关文章所有 6 个版本

[PDF] mlr.press

Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions

Y Chebotar, Q Vuong, K Hausman… - … on Robot Learning, 2023 - proceedings.mlr.press

In this work, we present a scalable reinforcement learning method for training multi-task
policies from large offline datasets that can leverage both human demonstrations and …

被引用次数：39 相关文章所有 6 个版本

[PDF] arxiv.org

Openchat: Advancing open-source language models with mixed-quality data

G Wang, S Cheng, X Zhan, X Li, S Song… - arXiv preprint arXiv …, 2023 - arxiv.org

Nowadays, open-source large language models like LLaMA have emerged. Recent
developments have incorporated supervised fine-tuning (SFT) and reinforcement learning …

被引用次数：109 相关文章所有 4 个版本

[PDF] arxiv.org

Idql: Implicit q-learning as an actor-critic method with diffusion policies

P Hansen-Estruch, I Kostrikov, M Janner… - arXiv preprint arXiv …, 2023 - arxiv.org

Effective offline RL methods require properly handling out-of-distribution actions. Implicit Q-
learning (IQL) addresses this by training a Q-function using only dataset actions through a …

被引用次数：66 相关文章所有 4 个版本

[PDF] mlr.press

Jump-start reinforcement learning

I Uchendu, T Xiao, Y Lu, B Zhu, M Yan… - International …, 2023 - proceedings.mlr.press

Reinforcement learning (RL) provides a theoretical framework for continuously improving an
agent's behavior via trial and error. However, efficiently learning policies from scratch can be …

被引用次数：79 相关文章所有 10 个版本

[PDF] mlr.press

Contrastive energy prediction for exact energy-guided diffusion sampling in offline reinforcement learning

C Lu, H Chen, J Chen, H Su, C Li… - … on Machine Learning, 2023 - proceedings.mlr.press

Guided sampling is a vital approach for applying diffusion models in real-world tasks that
embeds human-defined guidance during the sampling procedure. This paper considers a …

被引用次数：32 相关文章所有 7 个版本

[PDF] mlr.press

Constrained decision transformer for offline safe reinforcement learning

Z Liu, Z Guo, Y Yao, Z Cen, W Yu… - International …, 2023 - proceedings.mlr.press

Safe reinforcement learning (RL) trains a constraint satisfaction policy by interacting with the
environment. We aim to tackle a more challenging problem: learning a safe policy from an …

被引用次数：34 相关文章所有 7 个版本

[PDF] mlr.press

Q-learning decision transformer: Leveraging dynamic programming for conditional sequence modelling in offline rl

T Yamagata, A Khalil… - … on Machine Learning, 2023 - proceedings.mlr.press

Recent works have shown that tackling offline reinforcement learning (RL) with a conditional
policy produces promising results. The Decision Transformer (DT) combines the conditional …

被引用次数：49 相关文章所有 9 个版本

高级搜索

QQ 群