Do long-range language models actually use long-range context?

NF Liu, K Lin, J Hewitt, A Paranjape… - Transactions of the …, 2024 - direct.mit.edu

While recent language models have the ability to take long contexts as input, relatively little
is known about how well they use longer context. We analyze the performance of language …

被引用次数：695 相关文章所有 11 个版本

[PDF] neurips.cc

Megabyte: Predicting million-byte sequences with multiscale transformers

L Yu, D Simig, C Flaherty… - Advances in …, 2023 - proceedings.neurips.cc

Autoregressive transformers are spectacular models for short sequences but scale poorly to
long sequences such as high-resolution images, podcasts, code, or books. We proposed …

被引用次数：64 相关文章所有 5 个版本

[PDF] arxiv.org

Personality traits in large language models

G Serapio-García, M Safdari, C Crepy, L Sun… - arXiv preprint arXiv …, 2023 - arxiv.org

The advent of large language models (LLMs) has revolutionized natural language
processing, enabling the generation of coherent and contextually relevant human-like text …

被引用次数：116 相关文章所有 5 个版本

[PDF] arxiv.org

Octopack: Instruction tuning code large language models

N Muennighoff, Q Liu, A Zebaze, Q Zheng… - arXiv preprint arXiv …, 2023 - arxiv.org

Finetuning large language models (LLMs) on instructions leads to vast performance
improvements on natural language tasks. We apply instruction tuning using code …

被引用次数：111 相关文章所有 4 个版本

[PDF] arxiv.org

Memorizing transformers

Y Wu, MN Rabe, DL Hutchins, C Szegedy - arXiv preprint arXiv …, 2022 - arxiv.org

Language models typically need to be trained or finetuned in order to acquire new
knowledge, which involves updating their weights. We instead envision language models …

被引用次数：218 相关文章所有 5 个版本

[PDF] neurips.cc

Block-recurrent transformers

DL Hutchins, I Schlag, Y Wu, E Dyer… - Advances in neural …, 2022 - proceedings.neurips.cc

Abstract We introduce the Block-Recurrent Transformer, which applies a transformer layer in
a recurrent fashion along a sequence, and has linear complexity with respect to sequence …

被引用次数：98 相关文章所有 7 个版本

[PDF] acm.org

The power of noise: Redefining retrieval for rag systems

F Cuconasu, G Trappolini, F Siciliano, S Filice… - Proceedings of the 47th …, 2024 - dl.acm.org

Retrieval-Augmented Generation (RAG) has recently emerged as a method to extend
beyond the pre-trained knowledge of Large Language Models by augmenting the original …

被引用次数：52 相关文章所有 2 个版本

[PDF] arxiv.org

Landmark attention: Random-access infinite context length for transformers

A Mohtashami, M Jaggi - arXiv preprint arXiv:2305.16300, 2023 - arxiv.org

While Transformers have shown remarkable success in natural language processing, their
attention mechanism's large memory requirements have limited their ability to handle longer …

被引用次数：70 相关文章所有 3 个版本

[PDF] neurips.cc

Random-access infinite context length for transformers

A Mohtashami, M Jaggi - Advances in Neural Information …, 2024 - proceedings.neurips.cc

While Transformers have shown remarkable success in natural language processing, their
attention mechanism's large memory requirements have limited their ability to handle longer …

被引用次数：18 相关文章所有 3 个版本

[PDF] neurips.cc

Exposing attention glitches with flip-flop language modeling

B Liu, J Ash, S Goel… - Advances in Neural …, 2024 - proceedings.neurips.cc

Why do large language models sometimes output factual inaccuracies and exhibit
erroneous reasoning? The brittleness of these models, particularly when executing long …

被引用次数：34 相关文章所有 7 个版本

高级搜索

QQ 群