Can transformers jump around right in natural language? assessing performance transfer from SCAN

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

被引用次数：4161 相关文章所有 2 个版本

[PDF] nature.com

A taxonomy and review of generalization research in NLP

D Hupkes, M Giulianelli, V Dankers, M Artetxe… - Nature Machine …, 2023 - nature.com

The ability to generalize well is one of the primary desiderata for models of natural language
processing (NLP), but what 'good generalization'entails and how it should be evaluated is …

被引用次数：54 相关文章所有 9 个版本

[PDF] arxiv.org

State-of-the-art generalisation research in NLP: a taxonomy and review

D Hupkes, M Giulianelli, V Dankers, M Artetxe… - arXiv preprint arXiv …, 2022 - arxiv.org

The ability to generalise well is one of the primary desiderata of natural language
processing (NLP). Yet, what'good generalisation'entails and how it should be evaluated is …

被引用次数：58 相关文章所有 7 个版本

[PDF] arxiv.org

The paradox of the compositionality of natural language: A neural machine translation case study

V Dankers, E Bruni, D Hupkes - arXiv preprint arXiv:2108.05885, 2021 - arxiv.org

Obtaining human-like performance in NLP is often argued to require compositional
generalisation. Whether neural networks exhibit this ability is usually studied by training …

被引用次数：74 相关文章所有 5 个版本

[PDF] arxiv.org

Can transformer be too compositional? analysing idiom processing in neural machine translation

V Dankers, CG Lucas, I Titov - arXiv preprint arXiv:2205.15301, 2022 - arxiv.org

Unlike literal expressions, idioms' meanings do not directly follow from their parts, posing a
challenge for neural machine translation (NMT). NMT models are often unable to translate …

被引用次数：54 相关文章所有 5 个版本

[PDF] neurips.cc

Sequence-to-sequence learning with latent neural grammars

Y Kim - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc

Sequence-to-sequence learning with neural networks has become the de facto standard for
sequence modeling. This approach typically models the local distribution over the next …

被引用次数：44 相关文章所有 9 个版本

[PDF] arxiv.org

The neural data router: Adaptive control flow in transformers improves systematic generalization

R Csordás, K Irie, J Schmidhuber - arXiv preprint arXiv:2110.07732, 2021 - arxiv.org

Despite progress across a broad range of applications, Transformers have limited success
in systematic generalization. The situation is especially frustrating in the case of algorithmic …

被引用次数：56 相关文章所有 6 个版本

[PDF] arxiv.org

How bpe affects memorization in transformers

E Kharitonov, M Baroni, D Hupkes - arXiv preprint arXiv:2110.02782, 2021 - arxiv.org

Training data memorization in NLP can both be beneficial (eg, closed-book QA) and
undesirable (personal data extraction). In any case, successful model training requires a …

被引用次数：29 相关文章所有 3 个版本

[PDF] arxiv.org

The validity of evaluation results: Assessing concurrence across compositionality benchmarks

K Sun, A Williams, D Hupkes - arXiv preprint arXiv:2310.17514, 2023 - arxiv.org

NLP models have progressed drastically in recent years, according to numerous datasets
proposed to evaluate performance. Questions remain, however, about how particular …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Categorizing semantic representations for neural machine translation

Y Yin, Y Li, F Meng, J Zhou, Y Zhang - arXiv preprint arXiv:2210.06709, 2022 - arxiv.org

Modern neural machine translation (NMT) models have achieved competitive performance
in standard benchmarks. However, they have recently been shown to suffer limitation in …

被引用次数：9 相关文章所有 3 个版本

高级搜索

QQ 群