Mastering the game of Go with deep neural networks and tree search D Silver, A Huang, CJ Maddison, A Guez, L Sifre, G Van Den Driessche, ... Nature 529 (7587), 484-489, 2016 | 19034 | 2016 |
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play D Silver, T Hubert, J Schrittwieser, I Antonoglou, M Lai, A Guez, M Lanctot, ... Science 362 (6419), 1140-1144, 2018 | 6444* | 2018 |
Dueling Network Architectures for Deep Reinforcement Learning Z Wang, T Schaul, M Hessel, H van Hasselt, M Lanctot, N de Freitas arXiv preprint arXiv:1511.06581, 2016 | 4882 | 2016 |
Value-decomposition networks for cooperative multi-agent learning based on team reward P Sunehag, G Lever, A Gruslys, WM Czarnecki, V Zambaldi, M Jaderberg, ... Proceedings of the 17th international conference on autonomous agents and …, 2018 | 1683* | 2018 |
Deep Q-learning from Demonstrations T Hester, M Vecerik, O Pietquin, M Lanctot, T Schaul, B Piot, D Horgan, ... Association for the Advancement of Artificial Intelligence (AAAI), 2018 | 1210 | 2018 |
Multi-agent Reinforcement Learning in Sequential Social Dilemmas JZ Leibo, V Zambaldi, M Lanctot, J Marecki, T Graepel AAMAS, 2017 | 881 | 2017 |
A unified game-theoretic approach to multiagent reinforcement learning M Lanctot, V Zambaldi, A Gruslys, A Lazaridou, K Tuyls, J Pérolat, D Silver, ... arXiv preprint arXiv:1711.00832, 2017 | 727 | 2017 |
The hanabi challenge: A new frontier for ai research N Bard, JN Foerster, S Chandar, N Burch, M Lanctot, HF Song, E Parisotto, ... Artificial Intelligence 280, 103216, 2020 | 395 | 2020 |
Fictitious Self-Play in Extensive-Form Games J Heinrich, M Lanctot, D Silver International Conference on Machine Learning, 2015 | 373 | 2015 |
Monte Carlo sampling for regret minimization in extensive games M Lanctot, K Waugh, M Zinkevich, M Bowling Advances in neural information processing systems 22, 1078-1086, 2009 | 367 | 2009 |
Memory-efficient backpropagation through time A Gruslys, R Munos, I Danihelka, M Lanctot, A Graves Advances In Neural Information Processing Systems, 4125-4133, 2016 | 254* | 2016 |
OpenSpiel: A Framework for Reinforcement Learning in Games M Lanctot, E Lockhart, JB Lespiau, V Zambaldi, S Upadhyay, J Pérolat, ... arXiv preprint arXiv:1908.09453, 2019 | 252 | 2019 |
Emergent Communication through Negotiation K Cao, A Lazaridou, M Lanctot, JZ Leibo, K Tuyls, S Clark arXiv preprint arXiv:1804.03980, 2018 | 183 | 2018 |
Actor-critic policy optimization in partially observable multiagent environments S Srinivasan, M Lanctot, V Zambaldi, J Pérolat, K Tuyls, R Munos, ... Advances in Neural Information Processing Systems, 3422-3435, 2018 | 160 | 2018 |
Mastering the game of Stratego with model-free multiagent reinforcement learning J Perolat, B De Vylder, D Hennes, E Tarassov, F Strub, V de Boer, ... Science 378 (6623), 990-996, 2022 | 158 | 2022 |
Convolution by evolution: Differentiable pattern producing networks C Fernando, D Banarse, M Reynolds, F Besse, D Pfau, M Jaderberg, ... Proceedings of the Genetic and Evolutionary Computation Conference 2016, 109-116, 2016 | 132 | 2016 |
α-Rank: Multi-Agent Evaluation by Evolution S Omidshafiei, C Papadimitriou, G Piliouras, K Tuyls, M Rowland, ... Scientific reports 9 (1), 9937, 2019 | 130 | 2019 |
Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research JZ Leibo, E Hughes, M Lanctot, T Graepel arXiv preprint arXiv:1903.00742, 2019 | 115 | 2019 |
Real-Time Monte-Carlo Tree Search in Ms Pac-Man T Pepels, MHM Winands, M Lanctot Transactions on Computation Intelligence and AI in Games, 2014 | 114 | 2014 |
Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization. M Johanson, N Bard, M Lanctot, RG Gibson, M Bowling AAMAS, 837-846, 2012 | 107 | 2012 |