A momentumized, adaptive, dual averaged gradient method A Defazio, S Jelassi Journal of Machine Learning Research 23 (144), 1-34, 2022 | 75* | 2022 |
Vision transformers provably learn spatial structure S Jelassi, M Sander, Y Li Advances in Neural Information Processing Systems 35, 37822-37836, 2022 | 61 | 2022 |
Global convergence of neuron birth-death dynamics G Rotskoff, S Jelassi, J Bruna, E Vanden-Eijnden International Conference on Machine Learning, 2019 | 61* | 2019 |
A mean-field analysis of two-player zero-sum games C Domingo-Enrich, S Jelassi, A Mensch, G Rotskoff, J Bruna Advances in neural information processing systems 33, 20215-20226, 2020 | 55 | 2020 |
A permutation-equivariant neural network architecture for auction design J Rahme, S Jelassi, J Bruna, SM Weinberg Proceedings of the AAAI conference on artificial intelligence 35 (6), 5664-5672, 2021 | 48 | 2021 |
Auction learning as a two-player game J Rahme, S Jelassi, SM Weinberg arXiv preprint arXiv:2006.05684, 2020 | 47 | 2020 |
Towards understanding how momentum improves generalization in deep learning S Jelassi, Y Li International Conference on Machine Learning, 9965-10040, 2022 | 32 | 2022 |
Smoothed analysis of the low-rank approach for smooth semidefinite programs T Pumir, S Jelassi, N Boumal Advances in Neural Information Processing Systems 31, 2018 | 28 | 2018 |
Repeat after me: Transformers are better than state space models at copying S Jelassi, D Brandfonbrener, SM Kakade, E Malach arXiv preprint arXiv:2402.01032, 2024 | 23 | 2024 |
Length generalization in arithmetic transformers S Jelassi, S d'Ascoli, C Domingo-Enrich, Y Wu, Y Li, F Charton arXiv preprint arXiv:2306.15400, 2023 | 21 | 2023 |
Towards closing the gap between the theory and practice of SVRG O Sebbouh, N Gazagnadou, S Jelassi, F Bach, R Gower Advances in neural information processing systems 32, 2019 | 20 | 2019 |
Dissecting adaptive methods in GANs S Jelassi, D Dobre, A Mensch, Y Li, G Gidel arXiv preprint arXiv:2210.04319, 2022 | 18* | 2022 |
Depth separation beyond radial functions L Venturi, S Jelassi, T Ozuch, J Bruna Journal of machine learning research 23 (122), 1-56, 2022 | 17 | 2022 |
Extra-gradient with player sampling for faster convergence in n-player games S Jelassi, C Domingo-Enrich, D Scieur, A Mensch, J Bruna International Conference on Machine Learning, 4736-4745, 2020 | 13* | 2020 |
Depth Dependence of P Learning Rates in ReLU MLPs S Jelassi, B Hanin, Z Ji, SJ Reddi, S Bhojanapalli, S Kumar arXiv preprint arXiv:2305.07810, 2023 | 3 | 2023 |
Universal Length Generalization with Turing Programs K Hou, D Brandfonbrener, S Kakade, S Jelassi, E Malach arXiv preprint arXiv:2407.03310, 2024 | | 2024 |
How Does Overparameterization Affect Features? A Cagri Duzgun, S Jelassi, Y Li arXiv e-prints, arXiv: 2407.00968, 2024 | | 2024 |
How Does Overparameterization Affect Features? AC Duzgun, S Jelassi, Y Li arXiv preprint arXiv:2407.00968, 2024 | | 2024 |
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models K Li, S Jelassi, H Zhang, S Kakade, M Wattenberg, D Brandfonbrener arXiv preprint arXiv:2402.14688, 2024 | | 2024 |
Algorithmic and Architectural Implicit Biases in Deep Learning S Jelassi Princeton University, 2023 | | 2023 |