A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer
Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

Pre-trained models for natural language processing: A survey

X Qiu, T Sun, Y Xu, Y Shao, N Dai, X Huang - Science China …, 2020 - Springer
Recently, the emergence of pre-trained models (PTMs) has brought natural language
processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs …

A primer in BERTology: What we know about how BERT works

A Rogers, O Kovaleva, A Rumshisky - Transactions of the Association …, 2021 - direct.mit.edu
Transformer-based models have pushed state of the art in many areas of NLP, but our
understanding of what is behind their success is still limited. This paper is the first survey of …

Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers

W Wang, F Wei, L Dong, H Bao… - Advances in Neural …, 2020 - proceedings.neurips.cc
Pre-trained language models (eg, BERT (Devlin et al., 2018) and its variants) have achieved
remarkable success in varieties of NLP tasks. However, these models usually consist of …

Mobilebert: a compact task-agnostic bert for resource-limited devices

Z Sun, H Yu, X Song, R Liu, Y Yang, D Zhou - arXiv preprint arXiv …, 2020 - arxiv.org
Natural Language Processing (NLP) has recently achieved great success by using huge pre-
trained models with hundreds of millions of parameters. However, these models suffer from …

GoEmotions: A dataset of fine-grained emotions

D Demszky, D Movshovitz-Attias, J Ko, A Cowen… - arXiv preprint arXiv …, 2020 - arxiv.org
Understanding emotion expressed in language has a wide range of applications, from
building empathetic chatbots to detecting harmful online behavior. Advancement in this area …

Probing pretrained language models for lexical semantics

I Vulić, EM Ponti, R Litschko, G Glavaš… - Proceedings of the …, 2020 - aclanthology.org
The success of large pretrained language models (LMs) such as BERT and RoBERTa has
sparked interest in probing their representations, in order to unveil what types of knowledge …

Pretrained language models for biomedical and clinical tasks: understanding and extending the state-of-the-art

P Lewis, M Ott, J Du, V Stoyanov - Proceedings of the 3rd clinical …, 2020 - aclanthology.org
A large array of pretrained models are available to the biomedical NLP (BioNLP) community.
Finding the best model for a particular task can be difficult and time-consuming. For many …

Minilmv2: Multi-head self-attention relation distillation for compressing pretrained transformers

W Wang, H Bao, S Huang, L Dong, F Wei - arXiv preprint arXiv …, 2020 - arxiv.org
We generalize deep self-attention distillation in MiniLM (Wang et al., 2020) by only using self-
attention relation distillation for task-agnostic compression of pretrained Transformers. In …

Structured pruning of large language models

Z Wang, J Wohlwend, T Lei - arXiv preprint arXiv:1910.04732, 2019 - arxiv.org
Large language models have recently achieved state of the art performance across a wide
variety of natural language tasks. Meanwhile, the size of these models and their latency …