查看文章

neurips.cc 中的 [PDF]

Playing hard exploration games by watching youtube

作者

Yusuf Aytar, Tobias Pfaff, David Budden, Thomas Paine, Ziyu Wang, Nando De Freitas

发表日期

2018

期刊

Advances in neural information processing systems

卷号

简介

Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, ie with access to the agent’s exact environment setup and the demonstrator’s action and reward trajectories. Here we propose a method that overcomes these limitations in two stages. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (ie vision and sound). Second, we embed a single YouTube video in this representation to learn a reward function that encourages an agent to imitate human gameplay. This method of one-shot imitation allows our agent to convincingly exceed human-level performance on the infamously hard exploration games Montezuma’s Revenge, Pitfall! and Private Eye for the first time, even if the agent is not presented with any environment rewards.

引用总数

被引用次数：306

201820192020202120222023202424 47 64 57 35 56 22

学术搜索中的文章

Playing hard exploration games by watching youtube

Y Aytar, T Pfaff, D Budden, T Paine, Z Wang… - Advances in neural information processing systems, 2018

被引用次数：306 相关文章所有 9 个版本