Regularized evolution for image classifier architecture search E Real, A Aggarwal, Y Huang, QV Le Proceedings of the aaai conference on artificial intelligence 33 (01), 4780-4789, 2019 | 3292 | 2019 |
Scaling instruction-finetuned language models HW Chung, L Hou, S Longpre, B Zoph, Y Tay, W Fedus, Y Li, X Wang, ... Journal of Machine Learning Research 25 (70), 1-53, 2024 | 1835 | 2024 |
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism Y Huang, Y Cheng, A Bapna, O Firat, MX Chen, D Chen, HJ Lee, J Ngiam, ... Advances in Neural Information Processing Systems 32, 103--112, 2019 | 1523 | 2019 |
Lamda: Language models for dialog applications R Thoppilan, D De Freitas, J Hall, N Shazeer, A Kulshreshtha, HT Cheng, ... arXiv preprint arXiv:2201.08239, 2022 | 1298 | 2022 |
Palm 2 technical report R Anil, AM Dai, O Firat, M Johnson, D Lepikhin, A Passos, S Shakeri, ... arXiv preprint arXiv:2305.10403, 2023 | 976 | 2023 |
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023 | 843 | 2023 |
Gshard: Scaling giant models with conditional computation and automatic sharding D Lepikhin, HJ Lee, Y Xu, D Chen, O Firat, Y Huang, M Krikun, N Shazeer, ... International Conference on Learning Representations (ICLR), 2020 | 756 | 2020 |
Predictive coding Y Huang, RPN Rao Wiley Interdisciplinary Reviews: Cognitive Science 2 (5), 580-593, 2011 | 691 | 2011 |
Glam: Efficient scaling of language models with mixture-of-experts N Du, Y Huang, AM Dai, S Tong, D Lepikhin, Y Xu, M Krikun, Y Zhou, ... International Conference on Machine Learning, 5547-5569, 2022 | 507* | 2022 |
Alpa: Automating Inter-and Intra-Operator Parallelism for Distributed Deep Learning L Zheng, Z Li, H Zhang, Y Zhuang, Z Chen, Y Huang, Y Wang, Y Xu, ... 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2022 | 207 | 2022 |
Lingvo: a modular and scalable framework for sequence-to-sequence modeling J Shen, P Nguyen, Y Wu, Z Chen, MX Chen, Y Jia, A Kannan, T Sainath, ... arXiv preprint arXiv:1902.08295, 2019 | 202 | 2019 |
H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei. 2022. Scaling instruction-finetuned language models HW Chung, L Hou, S Longpre, B Zoph, Y Tay, W Fedus, E Li, X Wang, ... arXiv preprint arXiv:2210.11416, 2022 | 185* | 2022 |
Just pick a sign: Optimizing deep multitask models with gradient sign dropout Z Chen, J Ngiam, Y Huang, T Luong, H Kretzschmar, Y Chai, D Anguelov Advances in Neural Information Processing Systems 33, 2039-2050, 2020 | 167 | 2020 |
Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition Y Zhang, DS Park, W Han, J Qin, A Gulati, J Shor, A Jansen, Y Xu, ... IEEE Journal of Selected Topics in Signal Processing 16 (6), 1519-1532, 2022 | 155 | 2022 |
Mixture-of-experts with expert choice routing Y Zhou, T Lei, H Liu, N Du, Y Huang, V Zhao, AM Dai, QV Le, J Laudon Advances in Neural Information Processing Systems 35, 7103-7114, 2022 | 144 | 2022 |
GSPMD: general and scalable parallelization for ML computation graphs Y Xu, HJ Lee, D Chen, B Hechtman, Y Huang, R Joshi, M Krikun, ... arXiv preprint arXiv:2105.04663, 2021 | 91 | 2021 |
Beyond distillation: Task-level mixture-of-experts for efficient inference S Kudugunta, Y Huang, A Bapna, M Krikun, D Lepikhin, MT Luong, O Firat arXiv preprint arXiv:2110.03742, 2021 | 80 | 2021 |
Designing effective sparse expert models B Zoph, I Bello, S Kumar, N Du, Y Huang, J Dean, N Shazeer, W Fedus arXiv preprint arXiv:2202.08906 2 (3), 17, 2022 | 75 | 2022 |
{AlpaServe}: Statistical multiplexing with model parallelism for deep learning serving Z Li, L Zheng, Y Zhong, V Liu, Y Sheng, X Jin, Y Huang, Z Chen, H Zhang, ... 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2023 | 68 | 2023 |
St-moe: Designing stable and transferable sparse expert models B Zoph, I Bello, S Kumar, N Du, Y Huang, J Dean, N Shazeer, W Fedus arXiv preprint arXiv:2202.08906, 2022 | 65 | 2022 |