Dissociating language and thought in large language models

K Mahowald, AA Ivanova, IA Blank, N Kanwisher… - Trends in Cognitive …, 2024 - cell.com
Large language models (LLMs) have come closest among all models to date to mastering
human language, yet opinions about their linguistic and cognitive capabilities remain split …

Recent advances in natural language processing via large pre-trained language models: A survey

B Min, H Ross, E Sulem, APB Veyseh… - ACM Computing …, 2023 - dl.acm.org
Large, pre-trained language models (PLMs) such as BERT and GPT have drastically
changed the Natural Language Processing (NLP) field. For numerous NLP tasks …

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

A Srivastava, A Rastogi, A Rao, AAM Shoeb… - arXiv preprint arXiv …, 2022 - arxiv.org
Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative impact, these new …

Winoground: Probing vision and language models for visio-linguistic compositionality

T Thrush, R Jiang, M Bartolo, A Singh… - Proceedings of the …, 2022 - openaccess.thecvf.com
We present a novel task and dataset for evaluating the ability of vision and language models
to conduct visio-linguistic compositional reasoning, which we call Winoground. Given two …

A survey of data augmentation approaches for NLP

SY Feng, V Gangal, J Wei, S Chandar… - arXiv preprint arXiv …, 2021 - arxiv.org
Data augmentation has recently seen increased interest in NLP due to more work in low-
resource domains, new tasks, and the popularity of large-scale neural networks that require …

[PDF][PDF] Modern language models refute Chomsky's approach to language

S Piantadosi - Lingbuzz Preprint, lingbuzz, 2023 - lingbuzz.net
The rise and success of large language models undermines virtually every strong claim for
the innateness of language that has been proposed by generative linguistics. Modern …

What does bert look at? an analysis of bert's attention

K Clark, U Khandelwal, O Levy, CD Manning - arXiv preprint arXiv …, 2019 - arxiv.org
Large pre-trained neural networks such as BERT have had great recent success in NLP,
motivating a growing body of research investigating what aspects of language they are able …

Masked language modeling and the distributional hypothesis: Order word matters pre-training for little

K Sinha, R Jia, D Hupkes, J Pineau, A Williams… - arXiv preprint arXiv …, 2021 - arxiv.org
A possible explanation for the impressive performance of masked language model (MLM)
pre-training is that such models have learned to represent the syntactic structures prevalent …

Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned

E Voita, D Talbot, F Moiseev, R Sennrich… - arXiv preprint arXiv …, 2019 - arxiv.org
Multi-head self-attention is a key component of the Transformer, a state-of-the-art
architecture for neural machine translation. In this work we evaluate the contribution made …

A structural probe for finding syntax in word representations

J Hewitt, CD Manning - Proceedings of the 2019 Conference of …, 2019 - aclanthology.org
Recent work has improved our ability to detect linguistic knowledge in word representations.
However, current methods for detecting syntactic knowledge do not test whether syntax trees …