T Xie, T Li, W Zhu, W Han, Y Zhao - arXiv preprint arXiv:2409.17834, 2024 - arxiv.org
Due to their substantial sizes, large language models (LLMs) are typically deployed within a
single-backbone multi-tenant framework. In this setup, a single instance of an LLM …