Planning with diffusion for flexible behavior synthesis

M Janner, Y Du, JB Tenenbaum, S Levine - arXiv preprint arXiv …, 2022 - arxiv.org
Model-based reinforcement learning methods often use learning only for the purpose of
estimating an approximate dynamics model, offloading the rest of the decision-making work …

Offline reinforcement learning as one big sequence modeling problem

M Janner, Q Li, S Levine - Advances in neural information …, 2021 - proceedings.neurips.cc
Reinforcement learning (RL) is typically viewed as the problem of estimating single-step
policies (for model-free RL) or single-step models (for model-based RL), leveraging the …

Combo: Conservative offline model-based policy optimization

T Yu, A Kumar, R Rafailov… - Advances in neural …, 2021 - proceedings.neurips.cc
Abstract Model-based reinforcement learning (RL) algorithms, which learn a dynamics
model from logged experience and perform conservative planning under the learned model …

Offline reinforcement learning with fisher divergence critic regularization

I Kostrikov, R Fergus, J Tompson… - … on Machine Learning, 2021 - proceedings.mlr.press
Many modern approaches to offline Reinforcement Learning (RL) utilize behavior
regularization, typically augmenting a model-free actor critic algorithm with a penalty …

Rambo-rl: Robust adversarial model-based offline reinforcement learning

M Rigter, B Lacerda, N Hawes - Advances in neural …, 2022 - proceedings.neurips.cc
Offline reinforcement learning (RL) aims to find performant policies from logged data without
further environment interaction. Model-based algorithms, which learn a model of the …

Rvs: What is essential for offline rl via supervised learning?

S Emmons, B Eysenbach, I Kostrikov… - arXiv preprint arXiv …, 2021 - arxiv.org
Recent work has shown that supervised learning alone, without temporal difference (TD)
learning, can be remarkably effective for offline RL. When does this hold true, and which …

Temporal difference learning for model predictive control

N Hansen, X Wang, H Su - arXiv preprint arXiv:2203.04955, 2022 - arxiv.org
Data-driven model predictive control has two key advantages over model-free methods: a
potential for improved sample efficiency through model learning, and better performance as …

[HTML][HTML] Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

G Dulac-Arnold, N Levine, DJ Mankowitz, J Li… - Machine Learning, 2021 - Springer
Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is
beginning to show some successes in real-world scenarios. However, much of the research …

Driving into the future: Multiview visual forecasting and planning with world model for autonomous driving

Y Wang, J He, L Fan, H Li, Y Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com
In autonomous driving predicting future events in advance and evaluating the foreseeable
risks empowers autonomous vehicles to plan their actions enhancing safety and efficiency …

Elastic decision transformer

YH Wu, X Wang, M Hamaya - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract This paper introduces Elastic Decision Transformer (EDT), a significant
advancement over the existing Decision Transformer (DT) and its variants. Although DT …