W Liu, Z Tang, J Li, K Chen, M Zhang - arXiv preprint arXiv:2408.16967, 2024 - arxiv.org
Recent advancements in Large Language Models (LLMs) have yielded remarkable success across diverse fields. However, handling long contexts remains a significant challenge for …
Y Lu, JN Yan, S Yang, JT Chiu, S Ren, F Yuan… - arXiv preprint arXiv …, 2024 - arxiv.org
Broad textual understanding and in-context learning require language models that utilize full document contexts. Due to the implementation challenges associated with directly training …
H Zheng, W Zhu, X Wang - Findings of the Association for …, 2024 - aclanthology.org
Large language models (LLMs) have achieved impressive performance across various domains, but the limited context window and the expensive computational cost of processing …
X He, J Liu, S Chen - arXiv preprint arXiv:2501.15113, 2025 - arxiv.org
KV cache is a widely used acceleration technique for large language models (LLMs) inference. However, its memory requirement grows rapidly with input length. Previous …
The rapid evolution of generative models has significantly advanced artificial intelligence, enabling the creation of human-like text, realistic images, and even scientific discoveries …