所有版本 - 学术资源搜索

Lapo: Latent-variable advantage-weighted policy optimization for offline reinforcement learning

X Chen, A Ghadirzadeh, T Yu, J Wang… - Advances in …, 2022 - proceedings.neurips.cc

Offline reinforcement learning methods hold the promise of learning policies from pre-
collected datasets without the need to query the environment for new samples. This setting …

被引用次数：18 相关文章

LAPO: latent-variable advantage-weighted policy optimization for offline reinforcement learning

X Chen, A Ghadirzadeh, T Yu, J Wang, Y Gao… - Proceedings of the 36th …, 2022 - dl.acm.org

Offline reinforcement learning methods hold the promise of learning policies from pre-
collected datasets without the need to query the environment for new samples. This setting …

LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning

X Chen, A Ghadirzadeh, T Yu, J Wang… - Advances in Neural …, 2022 - openreview.net

Offline reinforcement learning methods hold the promise of learning policies from pre-
collected datasets without the need to query the environment for new samples. This setting …

[PDF] nips.cc

[PDF][PDF] LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning

X Chen, A Ghadirzadeh, T Yu, J Wang, Y Gao, W Li… - proceedings.nips.cc

Offline reinforcement learning methods hold the promise of learning policies from pre-
collected datasets without the need to query the environment for new samples. This setting …

高级搜索

QQ 群

Lapo: Latent-variable advantage-weighted policy optimization for offline reinforcement learning

LAPO: latent-variable advantage-weighted policy optimization for offline reinforcement learning

LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning

[PDF][PDF] LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning

引用