T Huang,
P Chen, K Gong, J Hawk, Z Bright… - arXiv preprint arXiv …, 2024 - arxiv.org
Since the increasing popularity of large language model (LLM) backend systems, it is
common and necessary to deploy stable serverless serving of LLM on multi-GPU clusters …