Assessing the ability of LSTMs to learn syntax-sensitive dependencies

K Mahowald, AA Ivanova, IA Blank, N Kanwisher… - Trends in Cognitive …, 2024 - cell.com

Large language models (LLMs) have come closest among all models to date to mastering
human language, yet opinions about their linguistic and cognitive capabilities remain split …

被引用次数：289 相关文章所有 10 个版本

[PDF] arxiv.org

Recent advances in natural language processing via large pre-trained language models: A survey

B Min, H Ross, E Sulem, APB Veyseh… - ACM Computing …, 2023 - dl.acm.org

Large, pre-trained language models (PLMs) such as BERT and GPT have drastically
changed the Natural Language Processing (NLP) field. For numerous NLP tasks …

被引用次数：659 相关文章所有 5 个版本

[PDF] arxiv.org

Holistic evaluation of language models

P Liang, R Bommasani, T Lee, D Tsipras… - arXiv preprint arXiv …, 2022 - arxiv.org

Language models (LMs) are becoming the foundation for almost all major language
technologies, but their capabilities, limitations, and risks are not well understood. We present …

被引用次数：808 相关文章所有 5 个版本

[PDF] arxiv.org

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

A Srivastava, A Rastogi, A Rao, AAM Shoeb… - arXiv preprint arXiv …, 2022 - arxiv.org

Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative impact, these new …

被引用次数：883 相关文章所有 11 个版本

[PDF] thecvf.com

Winoground: Probing vision and language models for visio-linguistic compositionality

T Thrush, R Jiang, M Bartolo, A Singh… - Proceedings of the …, 2022 - openaccess.thecvf.com

We present a novel task and dataset for evaluating the ability of vision and language models
to conduct visio-linguistic compositional reasoning, which we call Winoground. Given two …

被引用次数：280 相关文章所有 6 个版本

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

被引用次数：3395 相关文章所有 2 个版本

[HTML] sciencedirect.com

[HTML][HTML] Pre-trained language models and their applications

H Wang, J Li, H Wu, E Hovy, Y Sun - Engineering, 2022 - Elsevier

Pre-trained language models have achieved striking success in natural language
processing (NLP), leading to a paradigm shift from supervised learning to pre-training …

被引用次数：156 相关文章所有 2 个版本

[PDF] acm.org

Post-hoc interpretability for neural nlp: A survey

A Madsen, S Reddy, S Chandar - ACM Computing Surveys, 2022 - dl.acm.org

Neural networks for NLP are becoming increasingly complex and widespread, and there is a
growing concern if these models are responsible to use. Explaining models helps to address …

被引用次数：204 相关文章所有 5 个版本

[PDF] mit.edu

Measuring and improving consistency in pretrained language models

Y Elazar, N Kassner, S Ravfogel… - Transactions of the …, 2021 - direct.mit.edu

Consistency of a model—that is, the invariance of its behavior under meaning-preserving
alternations in its input—is a highly desirable property in natural language processing. In …

被引用次数：272 相关文章所有 11 个版本

[PDF] arxiv.org

Quantifying attention flow in transformers

S Abnar, W Zuidema - arXiv preprint arXiv:2005.00928, 2020 - arxiv.org

In the Transformer model," self-attention" combines information from attended embeddings
into the representation of the focal embedding in the next layer. Thus, across layers of the …

被引用次数：739 相关文章所有 6 个版本

高级搜索

QQ 群