Learning to encode position for transformer with continuous dynamical model

T Lin, Y Wang, X Liu, X Qiu - AI open, 2022 - Elsevier

Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …

被引用次数：1030 相关文章所有 4 个版本

[PDF] arxiv.org

Attention mechanism in neural networks: where it comes and where it goes

D Soydaner - Neural Computing and Applications, 2022 - Springer

A long time ago in the machine learning literature, the idea of incorporating a mechanism
inspired by the human visual system into neural networks was introduced. This idea is …

被引用次数：92 相关文章所有 8 个版本

[PDF] arxiv.org

Roformer: Enhanced transformer with rotary position embedding

J Su, M Ahmed, Y Lu, S Pan, W Bo, Y Liu - Neurocomputing, 2024 - Elsevier

Position encoding has recently been shown to be effective in transformer architecture. It
enables valuable supervision for dependency modeling between elements at different …

被引用次数：1030 相关文章所有 5 个版本

[PDF] arxiv.org

Conditional positional encodings for vision transformers

X Chu, Z Tian, B Zhang, X Wang, C Shen - arXiv preprint arXiv …, 2021 - arxiv.org

We propose a conditional positional encoding (CPE) scheme for vision Transformers. Unlike
previous fixed or learnable positional encodings, which are pre-defined and independent of …

被引用次数：589 相关文章所有 3 个版本

EAPT: efficient attention pyramid transformer for image processing

X Lin, S Sun, W Huang, B Sheng, P Li… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Recent transformer-based models, especially patch-based methods, have shown huge
potentiality in vision tasks. However, the split fixed-size patches divide the input features into …

被引用次数：147 相关文章所有 2 个版本

[PDF] springer.com

Design of a modified transformer architecture based on relative position coding

W Zheng, G Gong, J Tian, S Lu, R Wang, Z Yin… - International Journal of …, 2023 - Springer

Natural language processing (NLP) based on deep learning provides a positive
performance for generative dialogue system, and the transformer model is a new boost in …

被引用次数：41 相关文章所有 3 个版本

[PDF] arxiv.org

Rethinking positional encoding in language pre-training

G Ke, D He, TY Liu - arXiv preprint arXiv:2006.15595, 2020 - arxiv.org

In this work, we investigate the positional encoding methods used in language pre-training
(eg, BERT) and identify several problems in the existing formulations. First, we show that in …

被引用次数：245 相关文章所有 3 个版本

[PDF] arxiv.org

Randomized positional encodings boost length generalization of transformers

A Ruoss, G Delétang, T Genewein… - arXiv preprint arXiv …, 2023 - arxiv.org

Transformers have impressive generalization capabilities on tasks with a fixed context
length. However, they fail to generalize to sequences of arbitrary length, even for seemingly …

被引用次数：51 相关文章所有 4 个版本

[PDF] mit.edu

Position information in transformers: An overview

P Dufter, M Schmitt, H Schütze - Computational Linguistics, 2022 - direct.mit.edu

Transformers are arguably the main workhorse in recent natural language processing
research. By definition, a Transformer is invariant with respect to reordering of the input …

被引用次数：147 相关文章所有 8 个版本

[PDF] google.com

[PDF][PDF] On position embeddings in bert

B Wang, L Shang, C Lioma, X Jiang… - International …, 2020 - drive.google.com

ABSTRACT Various Position Embeddings (PEs) have been proposed in Transformer based
architectures (eg BERT) to model word order. These are empirically-driven and perform well …

被引用次数：134 相关文章所有 3 个版本

高级搜索

QQ 群