On the relation between position information and sentence length in neural machine translation

P Dufter, M Schmitt, H Schütze - Computational Linguistics, 2022 - direct.mit.edu

Transformers are arguably the main workhorse in recent natural language processing
research. By definition, a Transformer is invariant with respect to reordering of the input …

被引用次数：194 相关文章所有 8 个版本

[PDF] oup.com

ProteinBERT: a universal deep-learning model of protein sequence and function

N Brandes, D Ofer, Y Peleg, N Rappoport… - …, 2022 - academic.oup.com

Self-supervised deep language modeling has shown unprecedented success across natural
language tasks, and has recently been repurposed to biological sequences. However …

被引用次数：628 相关文章所有 15 个版本

[PDF] neurips.cc

Learning to reason and memorize with self-notes

J Lanchantin, S Toshniwal, J Weston… - Advances in Neural …, 2024 - proceedings.neurips.cc

Large language models have been shown to struggle with multi-step reasoning, and do not
retain previous reasoning steps for future use. We propose a simple method for solving both …

被引用次数：23 相关文章所有 5 个版本

[PDF] neurips.cc

Learnable fourier features for multi-dimensional spatial positional encoding

Y Li, S Si, G Li, CJ Hsieh… - Advances in Neural …, 2021 - proceedings.neurips.cc

Attentional mechanisms are order-invariant. Positional encoding is a crucial component to
allow attention-based deep model architectures such as Transformer to address sequences …

被引用次数：93 相关文章所有 9 个版本

[PDF] mit.edu

Salute the classic: Revisiting challenges of machine translation in the age of large language models

J Pang, F Ye, DF Wong, D Yu, S Shi, Z Tu… - Transactions of the …, 2025 - direct.mit.edu

Abstract The evolution of Neural Machine Translation (NMT) has been significantly
influenced by six core challenges (Koehn and Knowles,) that have acted as benchmarks for …

被引用次数：14 相关文章所有 3 个版本

[PDF] arxiv.org

One chatbot per person: Creating personalized chatbots based on implicit user profiles

Z Ma, Z Dou, Y Zhu, H Zhong, JR Wen - Proceedings of the 44th …, 2021 - dl.acm.org

Personalized chatbots focus on endowing chatbots with a consistent personality to behave
like real users, give more informative responses, and further act as personal assistants …

被引用次数：85 相关文章所有 7 个版本

[PDF] arxiv.org

Sequence length is a domain: Length-based overfitting in transformer models

D Variš, O Bojar - arXiv preprint arXiv:2109.07276, 2021 - arxiv.org

Transformer-based sequence-to-sequence architectures, while achieving state-of-the-art
results on a large number of NLP tasks, can still suffer from overfitting during training. In …

被引用次数：61 相关文章所有 6 个版本

[PDF] mit.edu

Challenges of neural machine translation for short texts

Y Wan, B Yang, DF Wong, LS Chao, L Yao… - Computational …, 2022 - direct.mit.edu

Short texts (STs) present in a variety of scenarios, including query, dialog, and entity names.
Most of the exciting studies in neural machine translation (NMT) are focused on tackling …

被引用次数：28 相关文章所有 5 个版本

[PDF] arxiv.org

Shape: Shifted absolute position embedding for transformers

S Kiyono, S Kobayashi, J Suzuki, K Inui - arXiv preprint arXiv:2109.05644, 2021 - arxiv.org

Position representation is crucial for building position-aware representations in
Transformers. Existing position representations suffer from a lack of generalization to test …

被引用次数：44 相关文章所有 8 个版本

[PDF] thecvf.com

Generalized classification of satellite image time series with thermal positional encoding

J Nyborg, C Pelletier, I Assent - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Large-scale crop type classification is a task at the core of remote sensing efforts with
applications of both economic and ecological importance. Current state-of-the-art deep …

被引用次数：27 相关文章所有 12 个版本

高级搜索

QQ 群