相关文章- 学术资源搜索

Awac: Accelerating online reinforcement learning with offline datasets

A Nair, A Gupta, M Dalal, S Levine - arXiv preprint arXiv:2006.09359, 2020 - arxiv.org

Reinforcement learning (RL) provides an appealing formalism for learning control policies
from experience. However, the classic active formulation of RL necessitates a lengthy active …

被引用次数：510 相关文章所有 7 个版本

[PDF] neurips.cc

Lapo: Latent-variable advantage-weighted policy optimization for offline reinforcement learning

X Chen, A Ghadirzadeh, T Yu, J Wang… - Advances in …, 2022 - proceedings.neurips.cc

Offline reinforcement learning methods hold the promise of learning policies from pre-
collected datasets without the need to query the environment for new samples. This setting …

被引用次数：18 相关文章所有 4 个版本

[PDF] arxiv.org

Latent-variable advantage-weighted policy optimization for offline rl

X Chen, A Ghadirzadeh, T Yu, Y Gao, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

Offline reinforcement learning methods hold the promise of learning policies from pre-
collected datasets without the need to query the environment for new transitions. This setting …

被引用次数：26 相关文章所有 4 个版本

[PDF] ieee.org

A survey on offline reinforcement learning: Taxonomy, review, and open problems

RF Prudencio, MROA Maximo… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

With the widespread adoption of deep learning, reinforcement learning (RL) has
experienced a dramatic increase in popularity, scaling to previously intractable problems …

被引用次数：230 相关文章所有 9 个版本

[PDF] arxiv.org

The in-sample softmax for offline reinforcement learning

C Xiao, H Wang, Y Pan, A White, M White - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning (RL) agents can leverage batches of previously collected data to
extract a reasonable control policy. An emerging issue in this offline RL setting, however, is …

被引用次数：25 相关文章所有 3 个版本

[PDF] arxiv.org

D4rl: Datasets for deep data-driven reinforcement learning

J Fu, A Kumar, O Nachum, G Tucker… - arXiv preprint arXiv …, 2020 - arxiv.org

The offline reinforcement learning (RL) setting (also known as full batch RL), where a policy
is learned from a static dataset, is compelling as progress enables RL methods to take …

被引用次数：1015 相关文章所有 3 个版本

[PDF] arxiv.org

Keep doing what worked: Behavioral modelling priors for offline reinforcement learning

NY Siegel, JT Springenberg, F Berkenkamp… - arXiv preprint arXiv …, 2020 - arxiv.org

Off-policy reinforcement learning algorithms promise to be applicable in settings where only
a fixed data-set (batch) of environment interactions is available and no new experience can …

被引用次数：291 相关文章所有 8 个版本

[PDF] arxiv.org

Deployment-efficient reinforcement learning via model-based offline optimization

T Matsushima, H Furuta, Y Matsuo, O Nachum… - arXiv preprint arXiv …, 2020 - arxiv.org

Most reinforcement learning (RL) algorithms assume online access to the environment, in
which one may readily interleave updates to the policy with experience collection using that …

被引用次数：143 相关文章所有 8 个版本

[PDF] arxiv.org

A workflow for offline model-free robotic reinforcement learning

A Kumar, A Singh, S Tian, C Finn, S Levine - arXiv preprint arXiv …, 2021 - arxiv.org

Offline reinforcement learning (RL) enables learning control policies by utilizing only prior
experience, without any online interaction. This can allow robots to acquire generalizable …

被引用次数：85 相关文章所有 6 个版本

[PDF] arxiv.org

Evolving rewards to automate reinforcement learning

A Faust, A Francis, D Mehta - arXiv preprint arXiv:1905.07628, 2019 - arxiv.org

Many continuous control tasks have easily formulated objectives, yet using them directly as
a reward in reinforcement learning (RL) leads to suboptimal policies. Therefore, many …

被引用次数：59 相关文章所有 5 个版本

高级搜索

QQ 群

Awac: Accelerating online reinforcement learning with offline datasets

Lapo: Latent-variable advantage-weighted policy optimization for offline reinforcement learning

Latent-variable advantage-weighted policy optimization for offline rl

A survey on offline reinforcement learning: Taxonomy, review, and open problems

The in-sample softmax for offline reinforcement learning

D4rl: Datasets for deep data-driven reinforcement learning

Keep doing what worked: Behavioral modelling priors for offline reinforcement learning

Deployment-efficient reinforcement learning via model-based offline optimization

A workflow for offline model-free robotic reinforcement learning

Evolving rewards to automate reinforcement learning

相关搜索

引用