Y Zhang,
Y Hu, R Zhao, J Lui, H Chen - arXiv preprint arXiv:2412.03131, 2024 - arxiv.org
Large language models (LLMs) demonstrate exceptional performance but incur high serving
costs due to substantial memory demands, with the key-value (KV) cache being a primary …