On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

[PDF][PDF] Modern language models refute Chomsky's approach to language

S Piantadosi - Lingbuzz Preprint, lingbuzz, 2023 - lingbuzz.net
The rise and success of large language models undermines virtually every strong claim for
the innateness of language that has been proposed by generative linguistics. Modern …

Masked language modeling and the distributional hypothesis: Order word matters pre-training for little

K Sinha, R Jia, D Hupkes, J Pineau, A Williams… - arXiv preprint arXiv …, 2021 - arxiv.org
A possible explanation for the impressive performance of masked language model (MLM)
pre-training is that such models have learned to represent the syntactic structures prevalent …

What artificial neural networks can tell us about human language acquisition

A Warstadt, SR Bowman - Algebraic structures in natural …, 2022 - taylorfrancis.com
Rapid progress in machine learning for natural language processing has the potential to
transform debates about how humans learn language. However, the learning environments …

A discriminative account of the learning, representation and processing of inflection systems

M Ramscar - Language, Cognition and Neuroscience, 2023 - Taylor & Francis
What kind of knowledge accounts for linguistic productivity? How is it acquired? For years,
debate on these questions has focused on a seemingly obscure domain: inflectional …

Word order does matter and shuffled language models know it

M Abdou, V Ravishankar, A Kulmizev… - Proceedings of the 60th …, 2022 - aclanthology.org
Recent studies have shown that language models pretrained and/or fine-tuned on randomly
permuted sentences exhibit competitive performance on GLUE, putting into question the …

Probing for the usage of grammatical number

K Lasri, T Pimentel, A Lenci, T Poibeau… - arXiv preprint arXiv …, 2022 - arxiv.org
A central quest of probing is to uncover how pre-trained models encode a linguistic property
within their representations. An encoding, however, might be spurious-ie, the model might …

SemAttack: Natural textual attacks via different semantic spaces

B Wang, C Xu, X Liu, Y Cheng, B Li - arXiv preprint arXiv:2205.01287, 2022 - arxiv.org
Recent studies show that pre-trained language models (LMs) are vulnerable to textual
adversarial attacks. However, existing attack methods either suffer from low attack success …

Pretraining with artificial language: Studying transferable knowledge in language models

R Ri, Y Tsuruoka - arXiv preprint arXiv:2203.10326, 2022 - arxiv.org
We investigate what kind of structural knowledge learned in neural network encoders is
transferable to processing natural language. We design artificial languages with structural …

When classifying grammatical role, BERT doesn't care about word order... except when it matters

I Papadimitriou, R Futrell, K Mahowald - arXiv preprint arXiv:2203.06204, 2022 - arxiv.org
Because meaning can often be inferred from lexical semantics alone, word order is often a
redundant cue in natural language. For example, the words chopped, chef, and onion are …