C Hu, H Huang, L Xu, X Chen, J Xu, S Chen… - arXiv e …, 2024 - ui.adsabs.harvard.edu
Transformer-based large language model (LLM) inference serving is now the backbone of
many cloud services. LLM inference consists of a prefill phase and a decode phase …