Exploring generalization in deep learning B Neyshabur, S Bhojanapalli, D McAllester, N Srebro Advances in neural information processing systems 30, 2017 | 1347 | 2017 |
Large batch optimization for deep learning: Training bert in 76 minutes Y You, J Li, S Reddi, J Hseu, S Kumar, S Bhojanapalli, X Song, J Demmel, ... arXiv preprint arXiv:1904.00962, 2019 | 956 | 2019 |
A pac-bayesian approach to spectrally-normalized margin bounds for neural networks B Neyshabur, S Bhojanapalli, N Srebro arXiv preprint arXiv:1707.09564, 2017 | 643 | 2017 |
Towards understanding the role of over-parametrization in generalization of neural networks B Neyshabur, Z Li, S Bhojanapalli, Y LeCun, N Srebro arXiv preprint arXiv:1805.12076, 2018 | 582 | 2018 |
Implicit regularization in matrix factorization S Gunasekar, BE Woodworth, S Bhojanapalli, B Neyshabur, N Srebro Advances in neural information processing systems 30, 2017 | 516 | 2017 |
Global optimality of local search for low rank matrix recovery S Bhojanapalli, B Neyshabur, N Srebro Advances in Neural Information Processing Systems, 3873-3881, 2016 | 434 | 2016 |
Understanding robustness of transformers for image classification S Bhojanapalli, A Chakrabarti, D Glasner, D Li, T Unterthiner, A Veit Proceedings of the IEEE/CVF international conference on computer vision …, 2021 | 376 | 2021 |
Does label smoothing mitigate label noise? M Lukasik, S Bhojanapalli, A Menon, S Kumar International Conference on Machine Learning, 6448-6458, 2020 | 348 | 2020 |
Are transformers universal approximators of sequence-to-sequence functions? C Yun, S Bhojanapalli, AS Rawat, SJ Reddi, S Kumar arXiv preprint arXiv:1912.10077, 2019 | 295 | 2019 |
Dropping convexity for faster semi-definite optimization S Bhojanapalli, A Kyrillidis, S Sanghavi Conference on Learning Theory, 530-582, 2016 | 182 | 2016 |
Coherent matrix completion. Y Chen, S Bhojanapalli, S Sanghavi, R Ward arXiv preprint arXiv:1306.2979, 2013 | 145 | 2013 |
Universal matrix completion S Bhojanapalli, P Jain International Conference on Machine Learning, 1881-1889, 2014 | 124 | 2014 |
Modifying memories in transformer models C Zhu, AS Rawat, M Zaheer, S Bhojanapalli, D Li, F Yu, S Kumar arXiv preprint arXiv:2012.00363, 2020 | 106 | 2020 |
Stabilizing GAN training with multiple random projections B Neyshabur, S Bhojanapalli, A Chakrabarti arXiv preprint arXiv:1705.07831, 2017 | 102 | 2017 |
Completing any low-rank matrix, provably Y Chen, S Bhojanapalli, S Sanghavi, R Ward The Journal of Machine Learning Research 16 (1), 2999-3034, 2015 | 94 | 2015 |
Low-rank bottleneck in multi-head attention models S Bhojanapalli, C Yun, AS Rawat, S Reddi, S Kumar International conference on machine learning, 864-873, 2020 | 75 | 2020 |
Coping with label shift via distributionally robust optimisation J Zhang, A Menon, A Veit, S Bhojanapalli, S Kumar, S Sra arXiv preprint arXiv:2010.12230, 2020 | 68 | 2020 |
O (n) connections are expressive enough: Universal approximability of sparse transformers C Yun, YW Chang, S Bhojanapalli, AS Rawat, S Reddi, S Kumar Advances in Neural Information Processing Systems 33, 13783-13794, 2020 | 63 | 2020 |
A simple and effective positional encoding for transformers PC Chen, H Tsai, S Bhojanapalli, HW Chung, YW Chang, CS Ferng arXiv preprint arXiv:2104.08698, 2021 | 48 | 2021 |
The lazy neuron phenomenon: On emergence of activation sparsity in transformers Z Li, C You, S Bhojanapalli, D Li, AS Rawat, SJ Reddi, K Ye, F Chern, ... arXiv preprint arXiv:2210.06313, 2022 | 45 | 2022 |