P Zeng, Z Ning,
J Zhao, W Cui,
M Xu,
L Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
We survey the large language model (LLM) serving area to understand the intricate
dynamics between cost-efficiency and accuracy, which is magnified by the growing need for …