Advancing transformer architecture in long-context large language models: A comprehensive survey

Y Huang, J Xu, J Lai, Z Jiang, T Chen, Z Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Transformer-based Large Language Models (LLMs) have been applied in diverse areas
such as knowledge bases, human interfaces, and dynamic agents, and marking a stride …

Harnessing the power of llms in practice: A survey on chatgpt and beyond

J Yang, H Jin, R Tang, X Han, Q Feng, H Jiang… - ACM Transactions on …, 2024 - dl.acm.org
This article presents a comprehensive and practical guide for practitioners and end-users
working with Large Language Models (LLMs) in their downstream Natural Language …

Longrag: Enhancing retrieval-augmented generation with long-context llms

Z Jiang, X Ma, W Chen - arXiv preprint arXiv:2406.15319, 2024 - arxiv.org
In traditional RAG framework, the basic retrieval units are normally short. The common
retrievers like DPR normally work with 100-word Wikipedia paragraphs. Such a design …

Trends and challenges of real-time learning in large language models: A critical review

M Jovanovic, P Voss - arXiv preprint arXiv:2404.18311, 2024 - arxiv.org
Real-time learning concerns the ability of learning systems to acquire knowledge over time,
enabling their adaptation and generalization to novel tasks. It is a critical ability for …

Longalign: A recipe for long context alignment of large language models

Y Bai, X Lv, J Zhang, Y He, J Qi, L Hou, J Tang… - arXiv preprint arXiv …, 2024 - arxiv.org
Extending large language models to effectively handle long contexts requires instruction fine-
tuning on input sequences of similar length. To address this, we present LongAlign--a recipe …

A human-inspired reading agent with gist memory of very long contexts

KH Lee, X Chen, H Furuta, J Canny… - arXiv preprint arXiv …, 2024 - arxiv.org
Current Large Language Models (LLMs) are not only limited to some maximum context
length, but also are not able to robustly consume long inputs. To address these limitations …

Longwriter: Unleashing 10,000+ word generation from long context llms

Y Bai, J Zhang, X Lv, L Zheng, S Zhu, L Hou… - arXiv preprint arXiv …, 2024 - arxiv.org
Current long context large language models (LLMs) can process inputs up to 100,000
tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words …

Snapkv: Llm knows what you are looking for before generation

Y Li, Y Huang, B Yang, B Venkitesh, A Locatelli… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have made remarkable progress in processing extensive
contexts, with the Key-Value (KV) cache playing a vital role in enhancing their performance …

Conv-basis: A new paradigm for efficient attention inference and gradient computation in transformers

J Gu, Y Liang, H Liu, Z Shi, Z Song, J Yin - arXiv preprint arXiv:2405.05219, 2024 - arxiv.org
Large Language Models (LLMs) have profoundly changed the world. Their self-attention
mechanism is the key to the success of transformers in LLMs. However, the quadratic …

Tensor attention training: Provably efficient learning of higher-order transformers

J Gu, Y Liang, Z Shi, Z Song, Y Zhou - arXiv preprint arXiv:2405.16411, 2024 - arxiv.org
Tensor Attention, a multi-view attention that is able to capture high-order correlations among
multiple modalities, can overcome the representational limitations of classical matrix …