Lapo: Latent-variable advantage-weighted policy optimization for offline reinforcement learning

X Chen, A Ghadirzadeh, T Yu, J Wang… - Advances in …, 2022 - proceedings.neurips.cc
Offline reinforcement learning methods hold the promise of learning policies from pre-
collected datasets without the need to query the environment for new samples. This setting …

LAPO: latent-variable advantage-weighted policy optimization for offline reinforcement learning

X Chen, A Ghadirzadeh, T Yu, J Wang, Y Gao… - Proceedings of the 36th …, 2022 - dl.acm.org
Offline reinforcement learning methods hold the promise of learning policies from pre-
collected datasets without the need to query the environment for new samples. This setting …

LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning

X Chen, A Ghadirzadeh, T Yu, J Wang… - Advances in Neural …, 2022 - openreview.net
Offline reinforcement learning methods hold the promise of learning policies from pre-
collected datasets without the need to query the environment for new samples. This setting …

[PDF][PDF] LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning

X Chen, A Ghadirzadeh, T Yu, J Wang, Y Gao, W Li… - proceedings.nips.cc
Offline reinforcement learning methods hold the promise of learning policies from pre-
collected datasets without the need to query the environment for new samples. This setting …