- 学术资源搜索

Advancing transformer architecture in long-context large language models: A comprehensive survey

Y Huang, J Xu, J Lai, Z Jiang, T Chen, Z Li… - arXiv preprint arXiv …, 2023 - arxiv.org

Transformer-based Large Language Models (LLMs) have been applied in diverse areas
such as knowledge bases, human interfaces, and dynamic agents, and marking a stride …

被引用次数：25 相关文章所有 2 个版本

[PDF] acm.org

Harnessing the power of llms in practice: A survey on chatgpt and beyond

J Yang, H Jin, R Tang, X Han, Q Feng, H Jiang… - ACM Transactions on …, 2024 - dl.acm.org

This article presents a comprehensive and practical guide for practitioners and end-users
working with Large Language Models (LLMs) in their downstream Natural Language …

被引用次数：555 相关文章所有 6 个版本

[PDF] arxiv.org

Longrag: Enhancing retrieval-augmented generation with long-context llms

Z Jiang, X Ma, W Chen - arXiv preprint arXiv:2406.15319, 2024 - arxiv.org

In traditional RAG framework, the basic retrieval units are normally short. The common
retrievers like DPR normally work with 100-word Wikipedia paragraphs. Such a design …

被引用次数：14 相关文章所有 2 个版本

[PDF] arxiv.org

Trends and challenges of real-time learning in large language models: A critical review

M Jovanovic, P Voss - arXiv preprint arXiv:2404.18311, 2024 - arxiv.org

Real-time learning concerns the ability of learning systems to acquire knowledge over time,
enabling their adaptation and generalization to novel tasks. It is a critical ability for …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Longalign: A recipe for long context alignment of large language models

Y Bai, X Lv, J Zhang, Y He, J Qi, L Hou, J Tang… - arXiv preprint arXiv …, 2024 - arxiv.org

Extending large language models to effectively handle long contexts requires instruction fine-
tuning on input sequences of similar length. To address this, we present LongAlign--a recipe …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

A human-inspired reading agent with gist memory of very long contexts

KH Lee, X Chen, H Furuta, J Canny… - arXiv preprint arXiv …, 2024 - arxiv.org

Current Large Language Models (LLMs) are not only limited to some maximum context
length, but also are not able to robustly consume long inputs. To address these limitations …

被引用次数：11 相关文章所有 4 个版本

[PDF] arxiv.org

Longwriter: Unleashing 10,000+ word generation from long context llms

Y Bai, J Zhang, X Lv, L Zheng, S Zhu, L Hou… - arXiv preprint arXiv …, 2024 - arxiv.org

Current long context large language models (LLMs) can process inputs up to 100,000
tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Snapkv: Llm knows what you are looking for before generation

Y Li, Y Huang, B Yang, B Venkitesh, A Locatelli… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have made remarkable progress in processing extensive
contexts, with the Key-Value (KV) cache playing a vital role in enhancing their performance …

被引用次数：30 相关文章所有 2 个版本

[PDF] arxiv.org

Conv-basis: A new paradigm for efficient attention inference and gradient computation in transformers

J Gu, Y Liang, H Liu, Z Shi, Z Song, J Yin - arXiv preprint arXiv:2405.05219, 2024 - arxiv.org

Large Language Models (LLMs) have profoundly changed the world. Their self-attention
mechanism is the key to the success of transformers in LLMs. However, the quadratic …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Tensor attention training: Provably efficient learning of higher-order transformers

J Gu, Y Liang, Z Shi, Z Song, Y Zhou - arXiv preprint arXiv:2405.16411, 2024 - arxiv.org

Tensor Attention, a multi-view attention that is able to capture high-order correlations among
multiple modalities, can overcome the representational limitations of classical matrix …

被引用次数：11 相关文章所有 2 个版本

高级搜索

QQ 群

Advancing transformer architecture in long-context large language models: A comprehensive survey

Harnessing the power of llms in practice: A survey on chatgpt and beyond

Longrag: Enhancing retrieval-augmented generation with long-context llms

Trends and challenges of real-time learning in large language models: A critical review

Longalign: A recipe for long context alignment of large language models

A human-inspired reading agent with gist memory of very long contexts

Longwriter: Unleashing 10,000+ word generation from long context llms

Snapkv: Llm knows what you are looking for before generation

Conv-basis: A new paradigm for efficient attention inference and gradient computation in transformers

Tensor attention training: Provably efficient learning of higher-order transformers

引用