Feature-learning networks are consistent across widths at realistic scales N Vyas, A Atanasov, B Bordelon, D Morwani, S Sainathan, C Pehlevan Advances in Neural Information Processing Systems 36, 2024 | 13 | 2024 |
Simplicity bias in 1-hidden layer neural networks D Morwani, J Batra, P Jain, P Netrapalli Advances in Neural Information Processing Systems 36, 2024 | 8 | 2024 |
Feature emergence via margin maximization: case studies in algebraic tasks D Morwani, BL Edelman, CA Oncescu, R Zhao, S Kakade arXiv preprint arXiv:2311.07568, 2023 | 6 | 2023 |
Inductive bias of gradient descent for weight normalized smooth homogeneous neural nets D Morwani, HG Ramaswamy International Conference on Algorithmic Learning Theory, 827-880, 2022 | 3 | 2022 |
Beyond implicit bias: The insignificance of sgd noise in online learning N Vyas, D Morwani, R Zhao, G Kaplun, S Kakade, B Barak arXiv preprint arXiv:2306.08590, 2023 | 2 | 2023 |
Using noise resilience for ranking generalization of deep neural networks D Morwani, R Vashisht, HG Ramaswamy arXiv preprint arXiv:2012.08854, 2020 | 2 | 2020 |
Inductive bias of gradient descent for exponentially weight normalized smooth homogeneous neural nets D Morwani, HG Ramaswamy arXiv preprint arXiv:2010.12909, 2020 | 1 | 2020 |
Deconstructing What Makes a Good Optimizer for Language Models R Zhao, D Morwani, D Brandfonbrener, N Vyas, S Kakade arXiv preprint arXiv:2407.07972, 2024 | | 2024 |
A New Perspective on Shampoo's Preconditioner D Morwani, I Shapira, N Vyas, E Malach, S Kakade, L Janson arXiv preprint arXiv:2406.17748, 2024 | | 2024 |
AdaMeM: Memory Efficient Momentum for Adafactor N Vyas, D Morwani, SM Kakade 2nd Workshop on Advancing Neural Network Training: Computational Efficiency …, 0 | | |