[HTML][HTML] A survey of transformers

T Lin, Y Wang, X Liu, X Qiu - AI open, 2022 - Elsevier
Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …

Attention mechanism in neural networks: where it comes and where it goes

D Soydaner - Neural Computing and Applications, 2022 - Springer
A long time ago in the machine learning literature, the idea of incorporating a mechanism
inspired by the human visual system into neural networks was introduced. This idea is …

Roformer: Enhanced transformer with rotary position embedding

J Su, M Ahmed, Y Lu, S Pan, W Bo, Y Liu - Neurocomputing, 2024 - Elsevier
Position encoding has recently been shown to be effective in transformer architecture. It
enables valuable supervision for dependency modeling between elements at different …

Conditional positional encodings for vision transformers

X Chu, Z Tian, B Zhang, X Wang, C Shen - arXiv preprint arXiv …, 2021 - arxiv.org
We propose a conditional positional encoding (CPE) scheme for vision Transformers. Unlike
previous fixed or learnable positional encodings, which are pre-defined and independent of …

EAPT: efficient attention pyramid transformer for image processing

X Lin, S Sun, W Huang, B Sheng, P Li… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Recent transformer-based models, especially patch-based methods, have shown huge
potentiality in vision tasks. However, the split fixed-size patches divide the input features into …

Design of a modified transformer architecture based on relative position coding

W Zheng, G Gong, J Tian, S Lu, R Wang, Z Yin… - International Journal of …, 2023 - Springer
Natural language processing (NLP) based on deep learning provides a positive
performance for generative dialogue system, and the transformer model is a new boost in …

Rethinking positional encoding in language pre-training

G Ke, D He, TY Liu - arXiv preprint arXiv:2006.15595, 2020 - arxiv.org
In this work, we investigate the positional encoding methods used in language pre-training
(eg, BERT) and identify several problems in the existing formulations. First, we show that in …

Randomized positional encodings boost length generalization of transformers

A Ruoss, G Delétang, T Genewein… - arXiv preprint arXiv …, 2023 - arxiv.org
Transformers have impressive generalization capabilities on tasks with a fixed context
length. However, they fail to generalize to sequences of arbitrary length, even for seemingly …

Position information in transformers: An overview

P Dufter, M Schmitt, H Schütze - Computational Linguistics, 2022 - direct.mit.edu
Transformers are arguably the main workhorse in recent natural language processing
research. By definition, a Transformer is invariant with respect to reordering of the input …

[PDF][PDF] On position embeddings in bert

B Wang, L Shang, C Lioma, X Jiang… - International …, 2020 - drive.google.com
ABSTRACT Various Position Embeddings (PEs) have been proposed in Transformer based
architectures (eg BERT) to model word order. These are empirically-driven and perform well …