SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification X Miao, G Oliaro, Z Zhang, X Cheng, Z Wang, Z Zhang, RYY Wong, A Zhu, ... arXiv preprint arXiv:2305.09781, 2023 | 69 | 2023 |
Galvatron: Efficient transformer training over multiple gpus using automatic parallelism X Miao, Y Wang, Y Jiang, C Shi, X Nie, H Zhang, B Cui arXiv preprint arXiv:2211.13878, 2022 | 36 | 2022 |
Spotserve: Serving generative large language models on preemptible instances X Miao, C Shi, J Duan, X Xi, D Lin, B Cui, Z Jia Proceedings of the 29th ACM International Conference on Architectural …, 2024 | 24 | 2024 |
Specinfer: Accelerating large language model serving with tree-based speculative inference and verification X Miao, G Oliaro, Z Zhang, X Cheng, Z Wang, Z Zhang, RYY Wong, A Zhu, ... Proceedings of the 29th ACM International Conference on Architectural …, 2024 | 20 | 2024 |
Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge B Xiao, C Shi, X Nie, F Yang, X Deng, L Su, W Chen, B Cui arXiv preprint arXiv:2405.00263, 2024 | 2 | 2024 |