[HTML][HTML] Lost in the middle: How language models use long contexts

NF Liu, K Lin, J Hewitt, A Paranjape… - Transactions of the …, 2024 - direct.mit.edu
While recent language models have the ability to take long contexts as input, relatively little
is known about how well they use longer context. We analyze the performance of language …

Megabyte: Predicting million-byte sequences with multiscale transformers

L Yu, D Simig, C Flaherty… - Advances in …, 2023 - proceedings.neurips.cc
Autoregressive transformers are spectacular models for short sequences but scale poorly to
long sequences such as high-resolution images, podcasts, code, or books. We proposed …

Personality traits in large language models

G Serapio-García, M Safdari, C Crepy, L Sun… - arXiv preprint arXiv …, 2023 - arxiv.org
The advent of large language models (LLMs) has revolutionized natural language
processing, enabling the generation of coherent and contextually relevant human-like text …

Octopack: Instruction tuning code large language models

N Muennighoff, Q Liu, A Zebaze, Q Zheng… - arXiv preprint arXiv …, 2023 - arxiv.org
Finetuning large language models (LLMs) on instructions leads to vast performance
improvements on natural language tasks. We apply instruction tuning using code …

Memorizing transformers

Y Wu, MN Rabe, DL Hutchins, C Szegedy - arXiv preprint arXiv …, 2022 - arxiv.org
Language models typically need to be trained or finetuned in order to acquire new
knowledge, which involves updating their weights. We instead envision language models …

Block-recurrent transformers

DL Hutchins, I Schlag, Y Wu, E Dyer… - Advances in neural …, 2022 - proceedings.neurips.cc
Abstract We introduce the Block-Recurrent Transformer, which applies a transformer layer in
a recurrent fashion along a sequence, and has linear complexity with respect to sequence …

The power of noise: Redefining retrieval for rag systems

F Cuconasu, G Trappolini, F Siciliano, S Filice… - Proceedings of the 47th …, 2024 - dl.acm.org
Retrieval-Augmented Generation (RAG) has recently emerged as a method to extend
beyond the pre-trained knowledge of Large Language Models by augmenting the original …

Landmark attention: Random-access infinite context length for transformers

A Mohtashami, M Jaggi - arXiv preprint arXiv:2305.16300, 2023 - arxiv.org
While Transformers have shown remarkable success in natural language processing, their
attention mechanism's large memory requirements have limited their ability to handle longer …

Random-access infinite context length for transformers

A Mohtashami, M Jaggi - Advances in Neural Information …, 2024 - proceedings.neurips.cc
While Transformers have shown remarkable success in natural language processing, their
attention mechanism's large memory requirements have limited their ability to handle longer …

Exposing attention glitches with flip-flop language modeling

B Liu, J Ash, S Goel… - Advances in Neural …, 2024 - proceedings.neurips.cc
Why do large language models sometimes output factual inaccuracies and exhibit
erroneous reasoning? The brittleness of these models, particularly when executing long …