Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

[HTML][HTML] A survey of transformers

T Lin, Y Wang, X Liu, X Qiu - AI open, 2022 - Elsevier
Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …

Scaling data-constrained language models

N Muennighoff, A Rush, B Barak… - Advances in …, 2024 - proceedings.neurips.cc
The current trend of scaling language models involves increasing both parameter count and
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting

H Wu, J Xu, J Wang, M Long - Advances in neural …, 2021 - proceedings.neurips.cc
Extending the forecasting time is a critical demand for real applications, such as extreme
weather early warning and long-term energy consumption planning. This paper studies the …

Coatnet: Marrying convolution and attention for all data sizes

Z Dai, H Liu, QV Le, M Tan - Advances in neural information …, 2021 - proceedings.neurips.cc
Transformers have attracted increasing interests in computer vision, but they still fall behind
state-of-the-art convolutional networks. In this work, we show that while Transformers tend to …

Roformer: Enhanced transformer with rotary position embedding

J Su, M Ahmed, Y Lu, S Pan, W Bo, Y Liu - Neurocomputing, 2024 - Elsevier
Position encoding has recently been shown to be effective in transformer architecture. It
enables valuable supervision for dependency modeling between elements at different …

Conditional positional encodings for vision transformers

X Chu, Z Tian, B Zhang, X Wang, C Shen - arXiv preprint arXiv …, 2021 - arxiv.org
We propose a conditional positional encoding (CPE) scheme for vision Transformers. Unlike
previous fixed or learnable positional encodings, which are pre-defined and independent of …

Rethinking attention with performers

K Choromanski, V Likhosherstov, D Dohan… - arXiv preprint arXiv …, 2020 - arxiv.org
We introduce Performers, Transformer architectures which can estimate regular (softmax)
full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to …

A transformer-based framework for multivariate time series representation learning

G Zerveas, S Jayaraman, D Patel… - Proceedings of the 27th …, 2021 - dl.acm.org
We present a novel framework for multivariate time series representation learning based on
the transformer encoder architecture. The framework includes an unsupervised pre-training …

Scan and snap: Understanding training dynamics and token composition in 1-layer transformer

Y Tian, Y Wang, B Chen, SS Du - Advances in Neural …, 2023 - proceedings.neurips.cc
Transformer architecture has shown impressive performance in multiple research domains
and has become the backbone of many neural network models. However, there is limited …