Position information in transformers: An overview

P Dufter, M Schmitt, H Schütze - Computational Linguistics, 2022 - direct.mit.edu
Transformers are arguably the main workhorse in recent natural language processing
research. By definition, a Transformer is invariant with respect to reordering of the input …

ProteinBERT: a universal deep-learning model of protein sequence and function

N Brandes, D Ofer, Y Peleg, N Rappoport… - …, 2022 - academic.oup.com
Self-supervised deep language modeling has shown unprecedented success across natural
language tasks, and has recently been repurposed to biological sequences. However …

Learning to reason and memorize with self-notes

J Lanchantin, S Toshniwal, J Weston… - Advances in Neural …, 2024 - proceedings.neurips.cc
Large language models have been shown to struggle with multi-step reasoning, and do not
retain previous reasoning steps for future use. We propose a simple method for solving both …

Learnable fourier features for multi-dimensional spatial positional encoding

Y Li, S Si, G Li, CJ Hsieh… - Advances in Neural …, 2021 - proceedings.neurips.cc
Attentional mechanisms are order-invariant. Positional encoding is a crucial component to
allow attention-based deep model architectures such as Transformer to address sequences …

Salute the classic: Revisiting challenges of machine translation in the age of large language models

J Pang, F Ye, DF Wong, D Yu, S Shi, Z Tu… - Transactions of the …, 2025 - direct.mit.edu
Abstract The evolution of Neural Machine Translation (NMT) has been significantly
influenced by six core challenges (Koehn and Knowles,) that have acted as benchmarks for …

One chatbot per person: Creating personalized chatbots based on implicit user profiles

Z Ma, Z Dou, Y Zhu, H Zhong, JR Wen - Proceedings of the 44th …, 2021 - dl.acm.org
Personalized chatbots focus on endowing chatbots with a consistent personality to behave
like real users, give more informative responses, and further act as personal assistants …

Sequence length is a domain: Length-based overfitting in transformer models

D Variš, O Bojar - arXiv preprint arXiv:2109.07276, 2021 - arxiv.org
Transformer-based sequence-to-sequence architectures, while achieving state-of-the-art
results on a large number of NLP tasks, can still suffer from overfitting during training. In …

Challenges of neural machine translation for short texts

Y Wan, B Yang, DF Wong, LS Chao, L Yao… - Computational …, 2022 - direct.mit.edu
Short texts (STs) present in a variety of scenarios, including query, dialog, and entity names.
Most of the exciting studies in neural machine translation (NMT) are focused on tackling …

Shape: Shifted absolute position embedding for transformers

S Kiyono, S Kobayashi, J Suzuki, K Inui - arXiv preprint arXiv:2109.05644, 2021 - arxiv.org
Position representation is crucial for building position-aware representations in
Transformers. Existing position representations suffer from a lack of generalization to test …

Generalized classification of satellite image time series with thermal positional encoding

J Nyborg, C Pelletier, I Assent - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Large-scale crop type classification is a task at the core of remote sensing efforts with
applications of both economic and ecological importance. Current state-of-the-art deep …