A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org
Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

Mamba: Linear-time sequence modeling with selective state spaces

A Gu, T Dao - arXiv preprint arXiv:2312.00752, 2023 - arxiv.org
Foundation models, now powering most of the exciting applications in deep learning, are
almost universally based on the Transformer architecture and its core attention module …

Rwkv: Reinventing rnns for the transformer era

B Peng, E Alcaide, Q Anthony, A Albalak… - arXiv preprint arXiv …, 2023 - arxiv.org
Transformers have revolutionized almost all natural language processing (NLP) tasks but
suffer from memory and computational complexity that scales quadratically with sequence …

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

M Reid, N Savinov, D Teplyashin, D Lepikhin… - arXiv preprint arXiv …, 2024 - arxiv.org
In this report, we present the latest model of the Gemini family, Gemini 1.5 Pro, a highly
compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning …

Protein design with guided discrete diffusion

N Gruver, S Stanton, N Frey… - Advances in neural …, 2024 - proceedings.neurips.cc
A popular approach to protein design is to combine a generative model with a discriminative
model for conditional sampling. The generative model samples plausible sequences while …

Hierarchically gated recurrent neural network for sequence modeling

Z Qin, S Yang, Y Zhong - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Transformers have surpassed RNNs in popularity due to their superior abilities in parallel
training and long-term dependency modeling. Recently, there has been a renewed interest …

A survey on vision mamba: Models, applications and challenges

R Xu, S Yang, Y Wang, B Du, H Chen - arXiv preprint arXiv:2404.18861, 2024 - arxiv.org
Mamba, a recent selective structured state space model, performs excellently on long
sequence modeling tasks. Mamba mitigates the modeling constraints of convolutional …

Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, applications, and challenges

BN Patro, VS Agneeswaran - arXiv preprint arXiv:2404.16112, 2024 - arxiv.org
Sequence modeling is a crucial area across various domains, including Natural Language
Processing (NLP), speech recognition, time series forecasting, music generation, and …

Laughing hyena distillery: Extracting compact recurrences from convolutions

S Massaroli, M Poli, D Fu, H Kumbong… - Advances in …, 2024 - proceedings.neurips.cc
Recent advances in attention-free sequence models rely on convolutions as alternatives to
the attention operator at the core of Transformers. In particular, long convolution sequence …

Exposing attention glitches with flip-flop language modeling

B Liu, J Ash, S Goel… - Advances in Neural …, 2024 - proceedings.neurips.cc
Why do large language models sometimes output factual inaccuracies and exhibit
erroneous reasoning? The brittleness of these models, particularly when executing long …