相关文章- 学术资源搜索

What matters in on-policy reinforcement learning? a large-scale empirical study

M Andrychowicz, A Raichuk, P Stańczyk… - arXiv preprint arXiv …, 2020 - arxiv.org

In recent years, on-policy reinforcement learning (RL) has been successfully applied to
many different continuous control tasks. While RL algorithms are often conceptually simple …

被引用次数：215 相关文章所有 5 个版本

[PDF] openreview.net

What matters for on-policy deep actor-critic methods? a large-scale study

M Andrychowicz, A Raichuk, P Stańczyk… - International …, 2021 - openreview.net

In recent years, reinforcement learning (RL) has been successfully applied to many different
continuous control tasks. While RL algorithms are often conceptually simple, their state-of …

被引用次数：166 相关文章所有 5 个版本

[PDF] arxiv.org

Evolving rewards to automate reinforcement learning

A Faust, A Francis, D Mehta - arXiv preprint arXiv:1905.07628, 2019 - arxiv.org

Many continuous control tasks have easily formulated objectives, yet using them directly as
a reward in reinforcement learning (RL) leads to suboptimal policies. Therefore, many …

被引用次数：59 相关文章所有 5 个版本

[PDF] arxiv.org

Awac: Accelerating online reinforcement learning with offline datasets

A Nair, A Gupta, M Dalal, S Levine - arXiv preprint arXiv:2006.09359, 2020 - arxiv.org

Reinforcement learning (RL) provides an appealing formalism for learning control policies
from experience. However, the classic active formulation of RL necessitates a lengthy active …

被引用次数：506 相关文章所有 7 个版本

[PDF] mlr.press

P3o: Policy-on policy-off policy optimization

R Fakoor, P Chaudhari… - Uncertainty in artificial …, 2020 - proceedings.mlr.press

On-policy reinforcement learning (RL) algorithms have high sample complexity while off-
policy algorithms are difficult to tune. Merging the two holds the promise to develop efficient …

被引用次数：57 相关文章所有 7 个版本

[PDF] arxiv.org

Keep doing what worked: Behavioral modelling priors for offline reinforcement learning

NY Siegel, JT Springenberg, F Berkenkamp… - arXiv preprint arXiv …, 2020 - arxiv.org

Off-policy reinforcement learning algorithms promise to be applicable in settings where only
a fixed data-set (batch) of environment interactions is available and no new experience can …

被引用次数：291 相关文章所有 8 个版本

[PDF] springer.com

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

G Dulac-Arnold, N Levine, DJ Mankowitz, J Li… - Machine Learning, 2021 - Springer

Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is
beginning to show some successes in real-world scenarios. However, much of the research …

被引用次数：424 相关文章所有 6 个版本

[PDF] arxiv.org

Advantage-weighted regression: Simple and scalable off-policy reinforcement learning

XB Peng, A Kumar, G Zhang, S Levine - arXiv preprint arXiv:1910.00177, 2019 - arxiv.org

In this paper, we aim to develop a simple and scalable reinforcement learning algorithm that
uses standard supervised learning methods as subroutines. Our goal is an algorithm that …

被引用次数：469 相关文章所有 6 个版本

[PDF] arxiv.org

The in-sample softmax for offline reinforcement learning

C Xiao, H Wang, Y Pan, A White, M White - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning (RL) agents can leverage batches of previously collected data to
extract a reasonable control policy. An emerging issue in this offline RL setting, however, is …

被引用次数：25 相关文章所有 3 个版本

[PDF] arxiv.org

An empirical investigation of the challenges of real-world reinforcement learning

G Dulac-Arnold, N Levine, DJ Mankowitz, J Li… - arXiv preprint arXiv …, 2020 - arxiv.org

Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is
beginning to show some successes in real-world scenarios. However, much of the research …

被引用次数：128 相关文章所有 3 个版本

高级搜索

QQ 群

What matters in on-policy reinforcement learning? a large-scale empirical study

What matters for on-policy deep actor-critic methods? a large-scale study

Evolving rewards to automate reinforcement learning

Awac: Accelerating online reinforcement learning with offline datasets

P3o: Policy-on policy-off policy optimization

Keep doing what worked: Behavioral modelling priors for offline reinforcement learning

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Advantage-weighted regression: Simple and scalable off-policy reinforcement learning

The in-sample softmax for offline reinforcement learning

An empirical investigation of the challenges of real-world reinforcement learning

相关搜索

引用