关注
Thomas William Anthony
Thomas William Anthony
Google DeepMind
在 google.com 的电子邮件经过验证
标题
引用次数
引用次数
年份
Thinking fast and slow with deep learning and tree search
TW Anthony, Z Tian, D Barber
Advances in Neural Information Processing Systems, 5360-5370, 2017
3982017
Openspiel: A framework for reinforcement learning in games
M Lanctot, E Lockhart, JB Lespiau, V Zambaldi, S Upadhyay, J Pérolat, ...
arXiv preprint arXiv:1908.09453, 2019
2532019
Mastering the game of Stratego with model-free multiagent reinforcement learning
J Perolat, B De Vylder, D Hennes, E Tarassov, F Strub, V de Boer, ...
Science 378 (6623), 990-996, 2022
1642022
From Poincaré recurrence to convergence in imperfect information games: Finding equilibrium via regularization
J Perolat, R Munos, JB Lespiau, S Omidshafiei, M Rowland, P Ortega, ...
International Conference on Machine Learning, 8525-8535, 2021
822021
On the role of planning in model-based deep reinforcement learning
JB Hamrick, AL Friesen, F Behbahani, A Guez, F Viola, S Witherspoon, ...
arXiv preprint arXiv:2011.04021, 2020
742020
Learning to Play No-Press Diplomacy with Best Response Policy Iteration
T Anthony, T Eccles, A Tacchetti, J Kramár, I Gemp, TC Hudson, N Porcel, ...
arXiv preprint arXiv:2006.04635, 2020
502020
Policy Gradient Search: Online Planning and Expert Iteration without Search Trees
TW Anthony, R Nishihara, P Moritz, T Salimans, J Schulman
arXiv preprint arXiv:1904.03646, 2019
312019
OpenSpiel: A Framework for Reinforcement Learning in Games. CoRR abs/1908.09453 (2019)
M Lanctot, E Lockhart, JB Lespiau, V Zambaldi, S Upadhyay, J Pérolat, ...
arXiv preprint cs.LG/1908.09453, 2019
242019
Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games
E Hughes, TW Anthony, T Eccles, JZ Leibo, D Balduzzi, Y Bachrach
arXiv preprint arXiv:2003.00799, 2020
222020
ITERATIVE EMPIRICAL GAME SOLVING VIA SINGLE POLICY BEST RESPONSE
MO Smith, T Anthony, MP Wellman
19*
Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent
I Gemp, R Savani, M Lanctot, Y Bachrach, T Anthony, R Everett, ...
arXiv preprint arXiv:2106.01285, 2021
182021
Smooth markets: A basic mechanism for organizing gradient-based learners
D Balduzzi, WM Czarnecki, TW Anthony, IM Gemp, E Hughes, JZ Leibo, ...
arXiv preprint arXiv:2001.04678, 2020
162020
Learning to play against any mixture of opponents
MO Smith, T Anthony, MP Wellman
Frontiers in Artificial Intelligence 6, 2023
142023
Turbocharging solution concepts: Solving NEs, CEs and CCEs with neural equilibrium solvers
L Marris, I Gemp, T Anthony, A Tacchetti, S Liu, K Tuyls
Advances in Neural Information Processing Systems 35, 5586-5600, 2022
132022
Expert iteration
TW Anthony
UCL (University College London), 2021
72021
Heterogeneous Social Value Orientation Leads to Meaningful Diversity in Sequential Social Dilemmas
U Madhushani, KR McKee, JP Agapiou, JZ Leibo, R Everett, T Anthony, ...
arXiv preprint arXiv:2305.00768, 2023
42023
Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning
M Lanctot, J Schultz, N Burch, MO Smith, D Hennes, T Anthony, J Perolat
arXiv preprint arXiv:2303.03196, 2023
42023
Designing all-pay auctions using deep learning and multi-agent simulation
I Gemp, T Anthony, J Kramar, T Eccles, A Tacchetti, Y Bachrach
Scientific Reports 12 (1), 16937, 2022
42022
Developing, evaluating and scaling learning agents in multi-agent environments
I Gemp, T Anthony, Y Bachrach, A Bhoopchand, K Bullard, J Connor, ...
AI Communications 35 (4), 271-284, 2022
42022
Strategic Knowledge Transfer
MO Smith, T Anthony, MP Wellman
Journal of Machine Learning Research 24 (233), 1-96, 2023
32023
系统目前无法执行此操作,请稍后再试。
文章 1–20