What matters in on-policy reinforcement learning? a large-scale empirical study

T Liao, R Taori, ID Raji, L Schmidt - Thirty-fifth Conference on …, 2021 - openreview.net

Many subfields of machine learning share a common stumbling block: evaluation. Advances
in machine learning often evaporate under closer scrutiny or turn out to be less widely …

被引用次数：113 相关文章所有 6 个版本

[PDF] arxiv.org

Autonomous unmanned aerial vehicle navigation using reinforcement learning: A systematic review

F AlMahamid, K Grolinger - Engineering Applications of Artificial …, 2022 - Elsevier

There is an increasing demand for using Unmanned Aerial Vehicle (UAV), known as drones,
in different applications such as packages delivery, traffic monitoring, search and rescue …

被引用次数：66 相关文章所有 11 个版本

[PDF] arxiv.org

Mastering diverse domains through world models

D Hafner, J Pasukonis, J Ba, T Lillicrap - arXiv preprint arXiv:2301.04104, 2023 - arxiv.org

Developing a general algorithm that learns to solve tasks across a wide range of
applications has been a fundamental challenge in artificial intelligence. Although current …

被引用次数：386 相关文章所有 2 个版本

[HTML] nature.com

[HTML][HTML] Magnetic control of tokamak plasmas through deep reinforcement learning

J Degrave, F Felici, J Buchli, M Neunert, B Tracey… - Nature, 2022 - nature.com

Nuclear fusion using magnetic confinement, in particular in the tokamak configuration, is a
promising path towards sustainable energy. A core challenge is to shape and maintain a …

被引用次数：779 相关文章所有 13 个版本

[PDF] openreview.net

What matters in learning from offline human demonstrations for robot manipulation

A Mandlekar, D Xu, J Wong, S Nasiriany… - arXiv preprint arXiv …, 2021 - arxiv.org

Imitating human demonstrations is a promising approach to endow robots with various
manipulation capabilities. While recent advances have been made in imitation learning and …

被引用次数：347 相关文章所有 4 个版本

[PDF] arxiv.org

Causal machine learning: A survey and open problems

J Kaddour, A Lynch, Q Liu, MJ Kusner… - arXiv preprint arXiv …, 2022 - arxiv.org

Causal Machine Learning (CausalML) is an umbrella term for machine learning methods
that formalize the data-generation process as a structural causal model (SCM). This …

被引用次数：137 相关文章所有 2 个版本

[PDF] mlr.press

Phasic policy gradient

KW Cobbe, J Hilton, O Klimov… - … on Machine Learning, 2021 - proceedings.mlr.press

Abstract We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework
which modifies traditional on-policy actor-critic methods by separating policy and value …

被引用次数：179 相关文章所有 5 个版本

[PDF] arxiv.org

Benchmarking multi-agent deep reinforcement learning algorithms in cooperative tasks

G Papoudakis, F Christianos, L Schäfer… - arXiv preprint arXiv …, 2020 - arxiv.org

Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used
evaluation tasks and criteria, making comparisons between approaches difficult. In this work …

被引用次数：248 相关文章所有 5 个版本

[PDF] arxiv.org

The effects of reward misspecification: Mapping and mitigating misaligned models

A Pan, K Bhatia, J Steinhardt - arXiv preprint arXiv:2201.03544, 2022 - arxiv.org

Reward hacking--where RL agents exploit gaps in misspecified reward functions--has been
widely observed, but not yet systematically studied. To understand how reward hacking …

被引用次数：125 相关文章所有 5 个版本

[PDF] mlr.press

Decoupling value and policy for generalization in reinforcement learning

R Raileanu, R Fergus - International Conference on …, 2021 - proceedings.mlr.press

Standard deep reinforcement learning algorithms use a shared representation for the policy
and value function, especially when training directly from images. However, we argue that …

被引用次数：104 相关文章所有 6 个版本

高级搜索

QQ 群