Minihack the planet: A sandbox for open-ended reinforcement learning research

R Kirk, A Zhang, E Grefenstette, T Rocktäschel - Journal of Artificial …, 2023 - jair.org

The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning (RL) aims to
produce RL algorithms whose policies generalise well to novel unseen situations at …

被引用次数：328 相关文章所有 9 个版本

[PDF] neurips.cc

Smacv2: An improved benchmark for cooperative multi-agent reinforcement learning

B Ellis, J Cook, S Moalla… - Advances in …, 2024 - proceedings.neurips.cc

The availability of challenging benchmarks has played a key role in the recent progress of
machine learning. In cooperative multi-agent reinforcement learning, the StarCraft Multi …

被引用次数：51 相关文章所有 6 个版本

[PDF] mlr.press

Evolving curricula with regret-based environment design

J Parker-Holder, M Jiang, M Dennis… - International …, 2022 - proceedings.mlr.press

Training generally-capable agents with reinforcement learning (RL) remains a significant
challenge. A promising avenue for improving the robustness of RL agents is through the use …

被引用次数：94 相关文章所有 5 个版本

[PDF] mlr.press

A generalist neural algorithmic learner

B Ibarz, V Kurin, G Papamakarios… - Learning on graphs …, 2022 - proceedings.mlr.press

The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks,
especially in a way that generalises out of distribution. While recent years have seen a surge …

被引用次数：56 相关文章所有 5 个版本

[PDF] arxiv.org

Human-timescale adaptation in an open-ended task space

AA Team, J Bauer, K Baumli, S Baveja… - arXiv preprint arXiv …, 2023 - arxiv.org

Foundation models have shown impressive adaptation and scalability in supervised and self-
supervised learning problems, but so far these successes have not fully translated to …

被引用次数：59 相关文章所有 2 个版本

[PDF] mlr.press

Human-timescale adaptation in an open-ended task space

J Bauer, K Baumli, F Behbahani… - International …, 2023 - proceedings.mlr.press

Foundation models have shown impressive adaptation and scalability in supervised and self-
supervised learning problems, but so far these successes have not fully translated to …

被引用次数：25 相关文章所有 4 个版本

[PDF] neurips.cc

Exploration via elliptical episodic bonuses

M Henaff, R Raileanu, M Jiang… - Advances in Neural …, 2022 - proceedings.neurips.cc

In recent years, a number of reinforcement learning (RL) methods have been pro-posed to
explore complex environments which differ across episodes. In this work, we show that the …

被引用次数：27 相关文章所有 6 个版本

[PDF] arxiv.org

Motif: Intrinsic motivation from artificial intelligence feedback

M Klissarov, P D'Oro, S Sodhani, R Raileanu… - arXiv preprint arXiv …, 2023 - arxiv.org

Exploring rich environments and evaluating one's actions without prior knowledge is
immensely challenging. In this paper, we propose Motif, a general method to interface such …

被引用次数：28 相关文章所有 6 个版本

[PDF] neurips.cc

Improving intrinsic exploration with language abstractions

J Mu, V Zhong, R Raileanu, M Jiang… - Advances in …, 2022 - proceedings.neurips.cc

Reinforcement learning (RL) agents are particularly hard to train when rewards are sparse.
One common solution is to use intrinsic rewards to encourage agents to explore their …

被引用次数：55 相关文章所有 7 个版本

[PDF] arxiv.org

Mind the gap: Challenges of deep learning approaches to theory of mind

J Aru, A Labash, O Corcoll, R Vicente - Artificial Intelligence Review, 2023 - Springer

Abstract Theory of Mind (ToM) is an essential ability of humans to infer the mental states of
others. Here we provide a coherent summary of the potential, current progress, and …

被引用次数：28 相关文章所有 7 个版本

高级搜索

QQ 群