An investigation into neural net optimization via hessian eigenvalue density B Ghorbani, S Krishnan, Y Xiao International Conference on Machine Learning, 2232-2241, 2019 | 310 | 2019 |
Linearized two-layers neural networks in high dimension B Ghorbani, S Mei, T Misiakiewicz, A Montanari The Annals of Statistics 49 (2), 1029-1054, 2021 | 241 | 2021 |
When do neural networks outperform kernel methods? B Ghorbani, S Mei, T Misiakiewicz, A Montanari Advances in Neural Information Processing Systems 33, 2020 | 183 | 2020 |
Limitations of lazy training of two-layers neural network B Ghorbani, S Mei, T Misiakiewicz, A Montanari Advances in Neural Information Processing Systems, 9111-9121, 2019 | 141 | 2019 |
Scaling Laws for Neural Machine Translation B Ghorbani, O Firat, M Freitag, A Bapna, M Krikun, X Garcia, C Chelba, ... arXiv preprint arXiv:2109.07740, 2021 | 73 | 2021 |
Do Current Multi-Task Optimization Methods in Deep Learning Even Help? D Xin, B Ghorbani, J Gilmer, A Garg, O Firat Advances in Neural Information Processing Systems 35, 13597-13609, 2022 | 44 | 2022 |
Adaptive Gradient Methods at the Edge of Stability JM Cohen, B Ghorbani, S Krishnan, N Agarwal, S Medapati, M Badura, ... arXiv preprint arXiv:2207.14484, 2022 | 43 | 2022 |
An instability in variational inference for topic models B Ghorbani, H Javadi, A Montanari International Conference on Machine Learning, 2221-2231, 2019 | 39 | 2019 |
A Loss Curvature Perspective on Training Instabilities of Deep Learning Models J Gilmer, B Ghorbani, A Garg, S Kudugunta, B Neyshabur, D Cardoze, ... International Conference on Learning Representations, 2021 | 32 | 2021 |
Data Scaling Laws in NMT: The Effect of Noise and Architecture Y Bansal, B Ghorbani, A Garg, B Zhang, C Cherry, B Neyshabur, O Firat International Conference on Machine Learning, 1466-1482, 2022 | 31 | 2022 |
A Loss Curvature Perspective on Training Instability in Deep Learning J Gilmer, B Ghorbani, A Garg, S Kudugunta, B Neyshabur, D Cardoze, ... arXiv preprint arXiv:2110.04369, 2021 | 28 | 2021 |
Scaling laws for multilingual neural machine translation P Fernandes, B Ghorbani, X Garcia, M Freitag, O Firat International Conference on Machine Learning, 10053-10071, 2023 | 16 | 2023 |
Epsilon Sampling Rocks: Investigating Sampling Strategies for\\Minimum Bayes Risk Decoding for Machine Translation M Freitag, B Ghorbani, P Fernandes arXiv preprint arXiv:2305.09860, 2023 | 16 | 2023 |
Examining scaling and transfer of language model architectures for machine translation B Zhang, B Ghorbani, A Bapna, Y Cheng, X Garcia, J Shen, O Firat International Conference on Machine Learning, 26176-26192, 2022 | 11 | 2022 |
Optimal covariance estimation for condition number loss in the spiked model D Donoho, B Ghorbani Econometrics and Statistics, 2024 | 8 | 2024 |
Discussion of:“Nonparametric regression using deep neural networks with ReLU activation function” B Ghorbani, S Mei, T Misiakiewicz, A Montanari The Annals of Statistics 48 (4), 1898-1901, 2020 | 8 | 2020 |
Binarized Neural Machine Translation Y Zhang, A Garg, Y Cao, L Lew, B Ghorbani, Z Zhang, O Firat Advances in Neural Information Processing Systems 36, 2024 | 7 | 2024 |
The effect of network depth on the optimization landscape B Ghorbani, Y Xiao, S Krishnan ICML 2019 Workshop on Identifying and Understanding Deep Learning Phenomena, 2019 | 3 | 2019 |
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning D Choi, D Xin, H Dadkhahi, J Gilmer, A Garg, O Firat, CK Yeh, AM Dai, ... Thirty-seventh Conference on Neural Information Processing Systems, 2023 | 1 | 2023 |
A loss curvature perspective on training instability in deep learning J Gilmer, B Ghorbani, A Garg, SR Kudugunta, B Neyshabur, D Cardoze, ... | 1 | 2022 |