- 学术资源搜索

文章

学术资源搜索

获得 1 条结果（用时0.02秒）

Variational Delayed Policy Optimization

Q Wu, SS Zhan, Y Wang, Y Wang, CW Lin, C Lv… - arXiv preprint arXiv …, 2024 - arxiv.org

In environments with delayed observation, state augmentation by including actions within
the delay window is adopted to retrieve Markovian property to enable reinforcement learning …

高级搜索

QQ 群

Variational Delayed Policy Optimization

引用