Attention is all you need A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... Advances in neural information processing systems 30, 2017 | 125415 | 2017 |
TensorFlow: Large-scale machine learning on heterogeneous systems M Abadi, A Agarwal, P Barham, E Brevdo, Z Chen, C Citro, GS Corrado, ... | 30570* | 2015 |
Google's neural machine translation system: Bridging the gap between human and machine translation Y Wu, M Schuster, Z Chen, QV Le, M Norouzi, W Macherey, M Krikun, ... arXiv preprint arXiv:1609.08144, 2016 | 8676 | 2016 |
Reformer: The efficient transformer N Kitaev, Ł Kaiser, A Levskaya arXiv preprint arXiv:2001.04451, 2020 | 2464 | 2020 |
Evaluating large language models trained on code M Chen, J Tworek, H Jun, Q Yuan, HPDO Pinto, J Kaplan, H Edwards, ... arXiv preprint arXiv:2107.03374, 2021 | 2422 | 2021 |
Gpt-4 technical report J Achiam, S Adler, S Agarwal, L Ahmad, I Akkaya, FL Aleman, D Almeida, ... arXiv preprint arXiv:2303.08774, 2023 | 2328 | 2023 |
Image transformer N Parmar, A Vaswani, J Uszkoreit, L Kaiser, N Shazeer, A Ku, D Tran International conference on machine learning, 4055-4064, 2018 | 1901 | 2018 |
Advances in neural information processing systems A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... Attention is all you need, 2017 | 1842 | 2017 |
Attention is all you need V Ashish Advances in neural information processing systems 30, I, 2017 | 1684 | 2017 |
Rethinking attention with performers K Choromanski, V Likhosherstov, D Dohan, X Song, A Gane, T Sarlos, ... arXiv preprint arXiv:2009.14794, 2020 | 1440 | 2020 |
Training verifiers to solve math word problems K Cobbe, V Kosaraju, M Bavarian, M Chen, H Jun, L Kaiser, M Plappert, ... arXiv preprint arXiv:2110.14168, 2021 | 1407 | 2021 |
Attention Is All You Need.(Nips), 2017 A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... arXiv preprint arXiv:1706.03762 10, S0140525X16001837, 2017 | 1290 | 2017 |
Regularizing neural networks by penalizing confident output distributions G Pereyra, G Tucker, J Chorowski, Ł Kaiser, G Hinton arXiv preprint arXiv:1701.06548, 2017 | 1217 | 2017 |
Grammar as a foreign language O Vinyals, Ł Kaiser, T Koo, S Petrov, I Sutskever, G Hinton Advances in neural information processing systems 28, 2015 | 1127 | 2015 |
Multi-task sequence to sequence learning MT Luong, QV Le, I Sutskever, O Vinyals, L Kaiser arXiv preprint arXiv:1511.06114, 2015 | 945 | 2015 |
Generating wikipedia by summarizing long sequences PJ Liu, M Saleh, E Pot, B Goodrich, R Sepassi, L Kaiser, N Shazeer arXiv preprint arXiv:1801.10198, 2018 | 941 | 2018 |
Universal transformers M Dehghani, S Gouws, O Vinyals, J Uszkoreit, Ł Kaiser arXiv preprint arXiv:1807.03819, 2018 | 932 | 2018 |
Model-based reinforcement learning for atari L Kaiser, M Babaeizadeh, P Milos, B Osinski, RH Campbell, ... arXiv preprint arXiv:1903.00374, 2019 | 920 | 2019 |
Tensor2tensor for neural machine translation A Vaswani, S Bengio, E Brevdo, F Chollet, AN Gomez, S Gouws, L Jones, ... arXiv preprint arXiv:1803.07416, 2018 | 626 | 2018 |
Adding gradient noise improves learning for very deep networks A Neelakantan, L Vilnis, QV Le, I Sutskever, L Kaiser, K Kurach, J Martens arXiv preprint arXiv:1511.06807, 2015 | 590 | 2015 |