Music transformer

J Kaddour, J Harris, M Mozes, H Bradley… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

被引用次数：337 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] A survey of transformers

T Lin, Y Wang, X Liu, X Qiu - AI open, 2022 - Elsevier

Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …

被引用次数：1168 相关文章所有 4 个版本

[PDF] neurips.cc

Scaling data-constrained language models

N Muennighoff, A Rush, B Barak… - Advances in …, 2024 - proceedings.neurips.cc

The current trend of scaling language models involves increasing both parameter count and
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …

被引用次数：150 相关文章所有 7 个版本

[PDF] neurips.cc

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting

H Wu, J Xu, J Wang, M Long - Advances in neural …, 2021 - proceedings.neurips.cc

Extending the forecasting time is a critical demand for real applications, such as extreme
weather early warning and long-term energy consumption planning. This paper studies the …

被引用次数：1708 相关文章所有 10 个版本

[PDF] neurips.cc

Coatnet: Marrying convolution and attention for all data sizes

Z Dai, H Liu, QV Le, M Tan - Advances in neural information …, 2021 - proceedings.neurips.cc

Transformers have attracted increasing interests in computer vision, but they still fall behind
state-of-the-art convolutional networks. In this work, we show that while Transformers tend to …

被引用次数：1246 相关文章所有 9 个版本

[PDF] arxiv.org

Roformer: Enhanced transformer with rotary position embedding

J Su, M Ahmed, Y Lu, S Pan, W Bo, Y Liu - Neurocomputing, 2024 - Elsevier

Position encoding has recently been shown to be effective in transformer architecture. It
enables valuable supervision for dependency modeling between elements at different …

被引用次数：1244 相关文章所有 5 个版本

[PDF] arxiv.org

Conditional positional encodings for vision transformers

X Chu, Z Tian, B Zhang, X Wang, C Shen - arXiv preprint arXiv …, 2021 - arxiv.org

We propose a conditional positional encoding (CPE) scheme for vision Transformers. Unlike
previous fixed or learnable positional encodings, which are pre-defined and independent of …

被引用次数：641 相关文章所有 3 个版本

[PDF] arxiv.org

Rethinking attention with performers

K Choromanski, V Likhosherstov, D Dohan… - arXiv preprint arXiv …, 2020 - arxiv.org

We introduce Performers, Transformer architectures which can estimate regular (softmax)
full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to …

被引用次数：1556 相关文章所有 8 个版本

[PDF] acm.org

A transformer-based framework for multivariate time series representation learning

G Zerveas, S Jayaraman, D Patel… - Proceedings of the 27th …, 2021 - dl.acm.org

We present a novel framework for multivariate time series representation learning based on
the transformer encoder architecture. The framework includes an unsupervised pre-training …

被引用次数：880 相关文章所有 12 个版本

[PDF] neurips.cc

Scan and snap: Understanding training dynamics and token composition in 1-layer transformer

Y Tian, Y Wang, B Chen, SS Du - Advances in Neural …, 2023 - proceedings.neurips.cc

Transformer architecture has shown impressive performance in multiple research domains
and has become the backbone of many neural network models. However, there is limited …

被引用次数：52 相关文章所有 10 个版本

高级搜索

QQ 群

Challenges and applications of large language models

[HTML][HTML] A survey of transformers

Scaling data-constrained language models

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting

Coatnet: Marrying convolution and attention for all data sizes

Roformer: Enhanced transformer with rotary position embedding

Conditional positional encodings for vision transformers

Rethinking attention with performers

A transformer-based framework for multivariate time series representation learning

Scan and snap: Understanding training dynamics and token composition in 1-layer transformer

引用