C Hu, H Huang,
J Hu, J Xu,
X Chen, T Xie… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language model (LLM) serving has transformed from stateless to stateful systems,
utilizing techniques like context caching and disaggregated inference. These optimizations …