On the opportunities and risks of foundation models R Bommasani, DA Hudson, E Adeli, R Altman, S Arora, S von Arx, ... arXiv preprint arXiv:2108.07258, 2021 | 3270 | 2021 |
A simple but tough-to-beat baseline for sentence embeddings S Arora, Y Liang, T Ma ICLR 2017, 2016 | 1641 | 2016 |
Learning imbalanced datasets with label-distribution-aware margin loss K Cao, C Wei, A Gaidon, N Arechiga, T Ma NeurIPS 2019; arXiv preprint arXiv:1906.07413, 2019 | 1553 | 2019 |
Generalization and Equilibrium in Generative Adversarial Nets (GANs) S Arora, R Ge, Y Liang, T Ma, Y Zhang ICML 2017;arXiv preprint arXiv:1703.00573, 2017, 2017 | 797 | 2017 |
Mopo: Model-based offline policy optimization T Yu, G Thomas, L Yu, S Ermon, JY Zou, S Levine, C Finn, T Ma Advances in Neural Information Processing Systems 33, 14129-14142, 2020 | 751 | 2020 |
Matrix Completion has No Spurious Local Minimum R Ge, JD Lee, T Ma NIPS 2016 (best student paper). arXiv preprint arXiv:1605.07272, 2016 | 703 | 2016 |
A latent variable model approach to pmi-based word embeddings S Arora, Y Li, Y Liang, T Ma, A Risteski Transactions of the Association for Computational Linguistics 4, 385-399, 2016 | 641* | 2016 |
Fine-tuning can distort pretrained features and underperform out-of-distribution A Kumar, A Raghunathan, R Jones, T Ma, P Liang arXiv preprint arXiv:2202.10054, 2022 | 514 | 2022 |
An explanation of in-context learning as implicit bayesian inference SM Xie, A Raghunathan, P Liang, T Ma arXiv preprint arXiv:2111.02080, 2021 | 482 | 2021 |
Provable bounds for learning some deep representations S Arora, A Bhaskara, R Ge, T Ma International conference on machine learning, 584-592, 2014 | 446 | 2014 |
Identity Matters in Deep Learning M Hardt, T Ma ICLR 2017, 2016 | 433 | 2016 |
Fixup initialization: Residual learning without normalization H Zhang, YN Dauphin, T Ma arXiv preprint arXiv:1901.09321, 2019 | 365 | 2019 |
Verified uncertainty calibration A Kumar, PS Liang, T Ma Advances in Neural Information Processing Systems 32, 2019 | 347 | 2019 |
Gradient descent learns linear dynamical systems M Hardt, T Ma, B Recht Journal of Machine Learning Research 19 (29), 1-44, 2018 | 346 | 2018 |
Finding Approximate Local Minima for Nonconvex Optimization in Linear Time N Agarwal, Z Allen-Zhu, B Bullins, E Hazan, T Ma STOC 2017, 2016 | 343* | 2016 |
Algorithmic Regularization in Over-parameterized Matrix Recovery and Neural Networks with Quadratic Activations Y Li, T Ma, H Zhang COLT 2018 (best paper); arXiv preprint arXiv:1712.09203, 2017 | 335* | 2017 |
What learning algorithm is in-context learning? investigations with linear models E Akyürek, D Schuurmans, J Andreas, T Ma, D Zhou arXiv preprint arXiv:2211.15661, 2022 | 324 | 2022 |
Towards explaining the regularization effect of initial large learning rate in training neural networks Y Li, C Wei, T Ma Advances in neural information processing systems 32, 2019 | 321 | 2019 |
Learning one-hidden-layer neural networks with landscape design R Ge, JD Lee, T Ma ICLR 2017; arXiv preprint arXiv:1711.00501, 2017 | 298 | 2017 |
Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel TM Colin Wei, Jason D. Lee, Qiang Liu arXiv preprint arXiv:1810.05369, 2019 | 281* | 2019 |