Diffusion model is an effective planner and data synthesizer for multi-task reinforcement learning

H He, C Bai, K Xu, Z Yang, W Zhang… - Advances in neural …, 2023 - proceedings.neurips.cc
Diffusion models have demonstrated highly-expressive generative capabilities in vision and
NLP. Recent studies in reinforcement learning (RL) have shown that diffusion models are …

Synthetic experience replay

C Lu, P Ball, YW Teh… - Advances in Neural …, 2024 - proceedings.neurips.cc
A key theme in the past decade has been that when large neural networks and large
datasets combine they can produce remarkable results. In deep reinforcement learning (RL) …

Corruption-robust offline reinforcement learning with general function approximation

C Ye, R Yang, Q Gu, T Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc
We investigate the problem of corruption robustness in offline reinforcement learning (RL)
with general function approximation, where an adversary can corrupt each sample in the …

Anti-exploration by random network distillation

A Nikulin, V Kurenkov, D Tarasov… - … on Machine Learning, 2023 - proceedings.mlr.press
Despite the success of Random Network Distillation (RND) in various domains, it was shown
as not discriminative enough to be used as an uncertainty estimator for penalizing out-of …

Revisiting the minimalist approach to offline reinforcement learning

D Tarasov, V Kurenkov, A Nikulin… - Advances in Neural …, 2024 - proceedings.neurips.cc
Recent years have witnessed significant advancements in offline reinforcement learning
(RL), resulting in the development of numerous algorithms with varying degrees of …

Unleashing the power of pre-trained language models for offline reinforcement learning

R Shi, Y Liu, Y Ze, SS Du, H Xu - arXiv preprint arXiv:2310.20587, 2023 - arxiv.org
Offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected
datasets. In real-world scenarios, data collection could be costly and risky; therefore, offline …

Towards robust offline reinforcement learning under diverse data corruption

R Yang, H Zhong, J Xu, A Zhang, C Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Offline reinforcement learning (RL) presents a promising approach for learning reinforced
policies from offline datasets without the need for costly or unsafe interactions with the …

Uni-o4: Unifying online and offline deep reinforcement learning with multi-step on-policy optimization

K Lei, Z He, C Lu, K Hu, Y Gao, H Xu - arXiv preprint arXiv:2311.03351, 2023 - arxiv.org
Combining offline and online reinforcement learning (RL) is crucial for efficient and safe
learning. However, previous approaches treat offline and online learning as separate …

Act: Empowering decision transformer with dynamic programming via advantage conditioning

CX Gao, C Wu, M Cao, R Kong, Z Zhang… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Decision Transformer (DT), which employs expressive sequence modeling techniques to
perform action generation, has emerged as a promising approach to offline policy …

Reinformer: Max-return sequence modeling for offline rl

Z Zhuang, D Peng, J Liu, Z Zhang, D Wang - arXiv preprint arXiv …, 2024 - arxiv.org
As a data-driven paradigm, offline reinforcement learning (RL) has been formulated as
sequence modeling that conditions on the hindsight information including returns, goal or …