A closer look at memorization in deep networks D Arpit, S Jastrzębski, N Ballas, D Krueger, E Bengio, MS Kanwal, ... ICML 2017 (arXiv preprint arXiv:1706.05394), 2017 | 1895 | 2017 |
On the spectral bias of deep neural networks N Rahaman, D Arpit, A Baratin, F Draxler, M Lin, FA Hamprecht, Y Bengio, ... ICML 2019 (arXiv preprint arXiv:1806.08734), 2018 | 1211* | 2018 |
Three factors influencing minima in SGD S Jastrzębski, Z Kenton, D Arpit, N Ballas, A Fischer, Y Bengio, A Storkey ICANN 2018 (arXiv preprint arXiv:1711.04623), 2017 | 504 | 2017 |
The Break-Even Point on Optimization Trajectories of Deep Neural Networks S Jastrzebski, M Szymczak, S Fort, D Arpit, J Tabor, K Cho, K Geras ICLR 2020 (arXiv preprint arXiv:2002.09572), 2020 | 150 | 2020 |
Normalization propagation: A parametric technique for removing internal covariate shift in deep networks D Arpit, Y Zhou, BU Kota, V Govindaraju ICML 2016 (arXiv preprint arXiv:1603.01431), 2016 | 143 | 2016 |
Residual connections encourage iterative inference S Jastrzebski, D Arpit, N Ballas, V Verma, T Che, Y Bengio ICLR 2018 (arXiv preprint arXiv:1710.04773), 2017 | 136 | 2017 |
A walk with sgd C Xing, D Arpit, C Tsirigotis, Y Bengio arXiv preprint arXiv:1802.08770, 2018 | 111 | 2018 |
Ensemble of averages: Improving model selection and boosting performance in domain generalization D Arpit, H Wang, Y Zhou, C Xiong NeurIPS 2022, 2021 | 105 | 2021 |
Why regularized auto-encoders learn sparse representation? D Arpit, Y Zhou, H Ngo, V Govindaraju ICML 2016 (arXiv preprint arXiv:1505.05561), 2015 | 92 | 2015 |
Deep Nets Don't Learn via Memorization D Krueger, N Ballas, S Jastrzebski, D Arpit, MS Kanwal, T Maharaj, ... ICLR 2017 Workshop, 2017 | 70 | 2017 |
Fraternal Dropout K Zolna, D Arpit, D Suhubdy, Y Bengio ICLR 2018 (arXiv preprint arXiv:1711.00066), 2017 | 60 | 2017 |
How to Initialize your Network? Robust Initialization for WeightNorm & ResNets D Arpit, V Campos, Y Bengio NeurIPs 2019, 2019 | 56 | 2019 |
Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization S Jastrzebski, D Arpit, O Astrand, G Kerg, H Wang, C Xiong, R Socher, ... ICML 2021, 2020 | 55 | 2020 |
h-detach: Modifying the LSTM Gradient Towards Better Optimization D Arpit, B Kanuparthi, G Kerg, NR Ke, I Mitliagkas, Y Bengio ICLR 2019 (arXiv preprint arXiv:1810.03023), 2018 | 46 | 2018 |
Bolaa: Benchmarking and orchestrating llm-augmented autonomous agents Z Liu, W Yao, J Zhang, L Xue, S Heinecke, R Murthy, Y Feng, Z Chen, ... arXiv preprint arXiv:2308.05960, 2023 | 44 | 2023 |
Variational bi-lstms S Shabanian, D Arpit, A Trischler, Y Bengio arXiv preprint arXiv:1711.05717, 2017 | 42 | 2017 |
Is joint training better for deep auto-encoders? Y Zhou, D Arpit, I Nwogu, V Govindaraju arXiv preprint arXiv:1405.1380, 2014 | 40 | 2014 |
Finding Flatter Minima with SGD S Jastrzębski, Z Kenton, D Arpit, N Ballas, A Fischer, Y Bengio, A Storkey ICLR 2018 Workshop, 2018 | 36 | 2018 |
The benefits of over-parameterization at initialization in deep ReLU networks D Arpit, Y Bengio arXiv preprint arXiv:1901.03611, 2019 | 34 | 2019 |
Retroformer: Retrospective large language agents with policy gradient optimization W Yao, S Heinecke, JC Niebles, Z Liu, Y Feng, L Xue, R Murthy, Z Chen, ... arXiv preprint arXiv:2308.02151, 2023 | 33 | 2023 |