Are we learning yet? a meta review of evaluation failures across machine learning

T Liao, R Taori, ID Raji, L Schmidt - Thirty-fifth Conference on …, 2021 - openreview.net
Many subfields of machine learning share a common stumbling block: evaluation. Advances
in machine learning often evaporate under closer scrutiny or turn out to be less widely …

Autonomous unmanned aerial vehicle navigation using reinforcement learning: A systematic review

F AlMahamid, K Grolinger - Engineering Applications of Artificial …, 2022 - Elsevier
There is an increasing demand for using Unmanned Aerial Vehicle (UAV), known as drones,
in different applications such as packages delivery, traffic monitoring, search and rescue …

Mastering diverse domains through world models

D Hafner, J Pasukonis, J Ba, T Lillicrap - arXiv preprint arXiv:2301.04104, 2023 - arxiv.org
Developing a general algorithm that learns to solve tasks across a wide range of
applications has been a fundamental challenge in artificial intelligence. Although current …

[HTML][HTML] Magnetic control of tokamak plasmas through deep reinforcement learning

J Degrave, F Felici, J Buchli, M Neunert, B Tracey… - Nature, 2022 - nature.com
Nuclear fusion using magnetic confinement, in particular in the tokamak configuration, is a
promising path towards sustainable energy. A core challenge is to shape and maintain a …

What matters in learning from offline human demonstrations for robot manipulation

A Mandlekar, D Xu, J Wong, S Nasiriany… - arXiv preprint arXiv …, 2021 - arxiv.org
Imitating human demonstrations is a promising approach to endow robots with various
manipulation capabilities. While recent advances have been made in imitation learning and …

Causal machine learning: A survey and open problems

J Kaddour, A Lynch, Q Liu, MJ Kusner… - arXiv preprint arXiv …, 2022 - arxiv.org
Causal Machine Learning (CausalML) is an umbrella term for machine learning methods
that formalize the data-generation process as a structural causal model (SCM). This …

Phasic policy gradient

KW Cobbe, J Hilton, O Klimov… - … on Machine Learning, 2021 - proceedings.mlr.press
Abstract We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework
which modifies traditional on-policy actor-critic methods by separating policy and value …

Benchmarking multi-agent deep reinforcement learning algorithms in cooperative tasks

G Papoudakis, F Christianos, L Schäfer… - arXiv preprint arXiv …, 2020 - arxiv.org
Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used
evaluation tasks and criteria, making comparisons between approaches difficult. In this work …

The effects of reward misspecification: Mapping and mitigating misaligned models

A Pan, K Bhatia, J Steinhardt - arXiv preprint arXiv:2201.03544, 2022 - arxiv.org
Reward hacking--where RL agents exploit gaps in misspecified reward functions--has been
widely observed, but not yet systematically studied. To understand how reward hacking …

Decoupling value and policy for generalization in reinforcement learning

R Raileanu, R Fergus - International Conference on …, 2021 - proceedings.mlr.press
Standard deep reinforcement learning algorithms use a shared representation for the policy
and value function, especially when training directly from images. However, we argue that …