On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

A taxonomy and review of generalization research in NLP

D Hupkes, M Giulianelli, V Dankers, M Artetxe… - Nature Machine …, 2023 - nature.com
The ability to generalize well is one of the primary desiderata for models of natural language
processing (NLP), but what 'good generalization'entails and how it should be evaluated is …

State-of-the-art generalisation research in NLP: a taxonomy and review

D Hupkes, M Giulianelli, V Dankers, M Artetxe… - arXiv preprint arXiv …, 2022 - arxiv.org
The ability to generalise well is one of the primary desiderata of natural language
processing (NLP). Yet, what'good generalisation'entails and how it should be evaluated is …

The paradox of the compositionality of natural language: A neural machine translation case study

V Dankers, E Bruni, D Hupkes - arXiv preprint arXiv:2108.05885, 2021 - arxiv.org
Obtaining human-like performance in NLP is often argued to require compositional
generalisation. Whether neural networks exhibit this ability is usually studied by training …

Can transformer be too compositional? analysing idiom processing in neural machine translation

V Dankers, CG Lucas, I Titov - arXiv preprint arXiv:2205.15301, 2022 - arxiv.org
Unlike literal expressions, idioms' meanings do not directly follow from their parts, posing a
challenge for neural machine translation (NMT). NMT models are often unable to translate …

Sequence-to-sequence learning with latent neural grammars

Y Kim - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc
Sequence-to-sequence learning with neural networks has become the de facto standard for
sequence modeling. This approach typically models the local distribution over the next …

The neural data router: Adaptive control flow in transformers improves systematic generalization

R Csordás, K Irie, J Schmidhuber - arXiv preprint arXiv:2110.07732, 2021 - arxiv.org
Despite progress across a broad range of applications, Transformers have limited success
in systematic generalization. The situation is especially frustrating in the case of algorithmic …

How bpe affects memorization in transformers

E Kharitonov, M Baroni, D Hupkes - arXiv preprint arXiv:2110.02782, 2021 - arxiv.org
Training data memorization in NLP can both be beneficial (eg, closed-book QA) and
undesirable (personal data extraction). In any case, successful model training requires a …

The validity of evaluation results: Assessing concurrence across compositionality benchmarks

K Sun, A Williams, D Hupkes - arXiv preprint arXiv:2310.17514, 2023 - arxiv.org
NLP models have progressed drastically in recent years, according to numerous datasets
proposed to evaluate performance. Questions remain, however, about how particular …

Categorizing semantic representations for neural machine translation

Y Yin, Y Li, F Meng, J Zhou, Y Zhang - arXiv preprint arXiv:2210.06709, 2022 - arxiv.org
Modern neural machine translation (NMT) models have achieved competitive performance
in standard benchmarks. However, they have recently been shown to suffer limitation in …