Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality WL Chiang, Z Li, Z Lin, Y Sheng, Z Wu, H Zhang, L Zheng, S Zhuang, ... See https://vicuna. lmsys. org (accessed 14 April 2023) 2 (3), 6, 2023 | 1639* | 2023 |
Judging llm-as-a-judge with mt-bench and chatbot arena L Zheng, WL Chiang, Y Sheng, S Zhuang, Z Wu, Y Zhuang, Z Lin, Z Li, ... Advances in Neural Information Processing Systems 36, 2024 | 1592* | 2024 |
Efficient memory management for large language model serving with pagedattention W Kwon, Z Li, S Zhuang, Y Sheng, L Zheng, CH Yu, J Gonzalez, H Zhang, ... Proceedings of the 29th Symposium on Operating Systems Principles, 611-626, 2023 | 574 | 2023 |
Train big, then compress: Rethinking model size for efficient training and inference of transformers Z Li, E Wallace, S Shen, K Lin, K Keutzer, D Klein, J Gonzalez International Conference on Machine Learning, 5958-5968, 2020 | 276 | 2020 |
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning L Zheng, Z Li, H Zhang, Y Zhuang, Z Chen, Y Huang, Y Wang, Y Xu, ... arXiv preprint arXiv:2201.12023, 2022 | 250 | 2022 |
Flexgen: High-throughput generative inference of large language models with a single gpu Y Sheng, L Zheng, B Yuan, Z Li, M Ryabinin, B Chen, P Liang, C Ré, ... International Conference on Machine Learning, 31094-31116, 2023 | 193 | 2023 |
Understanding and improving transformer from a multi-particle dynamic system point of view Y Lu, Z Li, D He, Z Sun, B Dong, T Qin, L Wang, TY Liu arXiv preprint arXiv:1906.02762, 2019 | 184 | 2019 |
Efficient training of bert by progressively stacking L Gong, D He, Z Li, T Qin, L Wang, T Liu International conference on machine learning, 2337-2346, 2019 | 140 | 2019 |
Fast structured decoding for sequence models Z Sun, Z Li, H Wang, D He, Z Lin, Z Deng Advances in Neural Information Processing Systems 32, 2019 | 116 | 2019 |
Terapipe: Token-level pipeline parallelism for training large-scale language models Z Li, S Zhuang, S Guo, D Zhuo, H Zhang, D Song, I Stoica International Conference on Machine Learning, 6543-6552, 2021 | 86 | 2021 |
{AlpaServe}: Statistical multiplexing with model parallelism for deep learning serving Z Li, L Zheng, Y Zhong, V Liu, Y Sheng, X Jin, Y Huang, Z Chen, H Zhang, ... 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2023 | 85 | 2023 |
Hint-based training for non-autoregressive machine translation Z Li, Z Lin, D He, F Tian, T Qin, L Wang, TY Liu | 77 | 2018 |
Towards binary-valued gates for robust lstm training Z Li, D He, F Tian, W Chen, T Qin, L Wang, T Liu International Conference on Machine Learning, 2995-3004, 2018 | 59 | 2018 |
Lmsys-chat-1m: A large-scale real-world llm conversation dataset L Zheng, WL Chiang, Y Sheng, T Li, S Zhuang, Z Wu, Y Zhuang, Z Li, ... arXiv preprint arXiv:2309.11998, 2023 | 52 | 2023 |
Hoplite: efficient and fault-tolerant collective communication for task-based distributed systems S Zhuang, Z Li, D Zhuo, S Wang, E Liang, R Nishihara, P Moritz, I Stoica Proceedings of the 2021 ACM SIGCOMM 2021 Conference, 641-656, 2021 | 26 | 2021 |
On optimizing the communication of model parallelism Y Zhuang, L Zheng, Z Li, E Xing, Q Ho, J Gonzalez, I Stoica, H Zhang, ... Proceedings of Machine Learning and Systems 5, 2023 | 21 | 2023 |
Fairness in serving large language models Y Sheng, S Cao, D Li, B Zhu, Z Li, D Zhuo, JE Gonzalez, I Stoica 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2024 | 14 | 2024 |
Rearchitecting in-memory object stores for low latency D Zhuo, K Zhang, Z Li, S Zhuang, S Wang, A Chen, I Stoica Proceedings of the VLDB Endowment, 555-568, 2021 | 3 | 2021 |
Optimizing Speculative Decoding for Serving Large Language Models Using Goodput X Liu, C Daniel, L Hu, W Kwon, Z Li, X Mo, A Cheung, Z Deng, I Stoica, ... arXiv preprint arXiv:2406.14066, 2024 | | 2024 |
Simple and Automatic Distributed Machine Learning on Ray H Zhang, Z Li, L Zheng, I Stoica Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data …, 2021 | | 2021 |