The impact of positional encoding on length generalization in transformers

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

被引用次数：259 相关文章所有 3 个版本

[PDF] arxiv.org

Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

被引用次数：272 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

被引用次数：2021 相关文章所有 4 个版本

[PDF] arxiv.org

Pre-trained language models for text generation: A survey

J Li, T Tang, WX Zhao, JY Nie, JR Wen - ACM Computing Surveys, 2024 - dl.acm.org

Text Generation aims to produce plausible and readable text in human language from input
data. The resurgence of deep learning has greatly advanced this field, in particular, with the …

被引用次数：273 相关文章所有 7 个版本

[PDF] neurips.cc

Scan and snap: Understanding training dynamics and token composition in 1-layer transformer

Y Tian, Y Wang, B Chen, SS Du - Advances in Neural …, 2023 - proceedings.neurips.cc

Transformer architecture has shown impressive performance in multiple research domains
and has become the backbone of many neural network models. However, there is limited …

被引用次数：47 相关文章所有 10 个版本

[PDF] arxiv.org

Yarn: Efficient context window extension of large language models

B Peng, J Quesnelle, H Fan, E Shippole - arXiv preprint arXiv:2309.00071, 2023 - arxiv.org

Rotary Position Embeddings (RoPE) have been shown to effectively encode positional
information in transformer-based language models. However, these models fail to …

被引用次数：131 相关文章所有 5 个版本

[PDF] arxiv.org

Lm-infinite: Simple on-the-fly length generalization for large language models

C Han, Q Wang, W Xiong, Y Chen, H Ji… - arXiv preprint arXiv …, 2023 - arxiv.org

In recent years, there have been remarkable advancements in the performance of
Transformer-based Large Language Models (LLMs) across various domains. As these LLMs …

被引用次数：57 相关文章所有 3 个版本

[PDF] arxiv.org

Griffin: Mixing gated linear recurrences with local attention for efficient language models

S De, SL Smith, A Fernando, A Botev… - arXiv preprint arXiv …, 2024 - arxiv.org

Recurrent neural networks (RNNs) have fast inference and scale efficiently on long
sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN with …

被引用次数：34 相关文章所有 2 个版本

[PDF] arxiv.org

Leave no context behind: Efficient infinite context transformers with infini-attention

T Munkhdalai, M Faruqui, S Gopal - arXiv preprint arXiv:2404.07143, 2024 - arxiv.org

This work introduces an efficient method to scale Transformer-based Large Language
Models (LLMs) to infinitely long inputs with bounded memory and computation. A key …

被引用次数：23 相关文章所有 2 个版本

[PDF] arxiv.org

Repeat after me: Transformers are better than state space models at copying

S Jelassi, D Brandfonbrener, SM Kakade… - arXiv preprint arXiv …, 2024 - arxiv.org

Transformers are the dominant architecture for sequence modeling, but there is growing
interest in models that use a fixed-size latent state that does not depend on the sequence …

被引用次数：23 相关文章所有 3 个版本

高级搜索

QQ 群