Symbolic working memory enhances language models for complex rule application

S Wang, Z Wei, Y Choi, X Ren - arXiv preprint arXiv:2408.13654, 2024 - arxiv.org
Large Language Models (LLMs) have shown remarkable reasoning performance but
struggle with multi-step deductive reasoning involving a series of rule application steps …

Memlong: Memory-augmented retrieval for long text modeling

W Liu, Z Tang, J Li, K Chen, M Zhang - arXiv preprint arXiv:2408.16967, 2024 - arxiv.org
Recent advancements in Large Language Models (LLMs) have yielded remarkable success
across diverse fields. However, handling long contexts remains a significant challenge for …

A controlled study on long context extension and generalization in llms

Y Lu, JN Yan, S Yang, JT Chiu, S Ren, F Yuan… - arXiv preprint arXiv …, 2024 - arxiv.org
Broad textual understanding and in-context learning require language models that utilize full
document contexts. Due to the implementation challenges associated with directly training …

SCA: Selective Compression Attention for Efficiently Extending the Context Window of Large Language Models

H Zheng, W Zhu, X Wang - Findings of the Association for …, 2024 - aclanthology.org
Large language models (LLMs) have achieved impressive performance across various
domains, but the limited context window and the expensive computational cost of processing …

Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads

X He, J Liu, S Chen - arXiv preprint arXiv:2501.15113, 2025 - arxiv.org
KV cache is a widely used acceleration technique for large language models (LLMs)
inference. However, its memory requirement grows rapidly with input length. Previous …

Designing for Inference in Future Generative Models

JN Yan - 2024 - search.proquest.com
The rapid evolution of generative models has significantly advanced artificial intelligence,
enabling the creation of human-like text, realistic images, and even scientific discoveries …