A Review of Safe Reinforcement Learning: Methods, Theories and Applications

S Gu, L Yang, Y Du, G Chen, F Walter… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

Decision transformer: Reinforcement learning via sequence modeling

L Chen, K Lu, A Rajeswaran, K Lee… - Advances in neural …, 2021 - proceedings.neurips.cc
We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence
modeling problem. This allows us to draw upon the simplicity and scalability of the …

Behavior Transformers: Cloning modes with one stone

NM Shafiullah, Z Cui… - Advances in neural …, 2022 - proceedings.neurips.cc
While behavior learning has made impressive progress in recent times, it lags behind
computer vision and natural language processing due to its inability to leverage large …

Foundation models for decision making: Problems, methods, and opportunities

S Yang, O Nachum, Y Du, J Wei, P Abbeel… - arXiv preprint arXiv …, 2023 - arxiv.org
Foundation models pretrained on diverse data at scale have demonstrated extraordinary
capabilities in a wide range of vision and language tasks. When such models are deployed …

Imitating human behaviour with diffusion models

T Pearce, T Rashid, A Kanervisto, D Bignell… - arXiv preprint arXiv …, 2023 - arxiv.org
Diffusion models have emerged as powerful generative models in the text-to-image domain.
This paper studies their application as observation-to-action models for imitating human …

Offline-to-online reinforcement learning via balanced replay and pessimistic q-ensemble

S Lee, Y Seo, K Lee, P Abbeel… - Conference on Robot …, 2022 - proceedings.mlr.press
Recent advance in deep offline reinforcement learning (RL) has made it possible to train
strong robotic agents from offline datasets. However, depending on the quality of the trained …

Goal-conditioned imitation learning using score-based diffusion policies

M Reuss, M Li, X Jia, R Lioutikov - arXiv preprint arXiv:2304.02532, 2023 - arxiv.org
We propose a new policy representation based on score-based diffusion models (SDMs).
We apply our new policy representation in the domain of Goal-Conditioned Imitation …

Can wikipedia help offline reinforcement learning?

M Reid, Y Yamada, SS Gu - arXiv preprint arXiv:2201.12122, 2022 - arxiv.org
Fine-tuning reinforcement learning (RL) models has been challenging because of a lack of
large scale off-the-shelf datasets as well as high variance in transferability among different …

Offline reinforcement learning via high-fidelity generative behavior modeling

H Chen, C Lu, C Ying, H Su, J Zhu - arXiv preprint arXiv:2209.14548, 2022 - arxiv.org
In offline reinforcement learning, weighted regression is a common method to ensure the
learned policy stays close to the behavior policy and to prevent selecting out-of-sample …

Representation matters: Offline pretraining for sequential decision making

M Yang, O Nachum - International Conference on Machine …, 2021 - proceedings.mlr.press
The recent success of supervised learning methods on ever larger offline datasets has
spurred interest in the reinforcement learning (RL) field to investigate whether the same …