Small and practical BERT models for sequence labeling

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer

Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

被引用次数：564 相关文章所有 2 个版本

[PDF] arxiv.org

Pre-trained models for natural language processing: A survey

X Qiu, T Sun, Y Xu, Y Shao, N Dai, X Huang - Science China …, 2020 - Springer

Recently, the emergence of pre-trained models (PTMs) has brought natural language
processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs …

被引用次数：1877 相关文章所有 9 个版本

[PDF] mit.edu

A primer in BERTology: What we know about how BERT works

A Rogers, O Kovaleva, A Rumshisky - Transactions of the Association …, 2021 - direct.mit.edu

Transformer-based models have pushed state of the art in many areas of NLP, but our
understanding of what is behind their success is still limited. This paper is the first survey of …

被引用次数：1774 相关文章所有 12 个版本

[PDF] neurips.cc

Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers

W Wang, F Wei, L Dong, H Bao… - Advances in Neural …, 2020 - proceedings.neurips.cc

Pre-trained language models (eg, BERT (Devlin et al., 2018) and its variants) have achieved
remarkable success in varieties of NLP tasks. However, these models usually consist of …

被引用次数：1203 相关文章所有 6 个版本

[PDF] arxiv.org

Mobilebert: a compact task-agnostic bert for resource-limited devices

Z Sun, H Yu, X Song, R Liu, Y Yang, D Zhou - arXiv preprint arXiv …, 2020 - arxiv.org

Natural Language Processing (NLP) has recently achieved great success by using huge pre-
trained models with hundreds of millions of parameters. However, these models suffer from …

被引用次数：820 相关文章所有 5 个版本

[PDF] arxiv.org

GoEmotions: A dataset of fine-grained emotions

D Demszky, D Movshovitz-Attias, J Ko, A Cowen… - arXiv preprint arXiv …, 2020 - arxiv.org

Understanding emotion expressed in language has a wide range of applications, from
building empathetic chatbots to detecting harmful online behavior. Advancement in this area …

被引用次数：802 相关文章所有 7 个版本

[PDF] aclanthology.org

Probing pretrained language models for lexical semantics

I Vulić, EM Ponti, R Litschko, G Glavaš… - Proceedings of the …, 2020 - aclanthology.org

The success of large pretrained language models (LMs) such as BERT and RoBERTa has
sparked interest in probing their representations, in order to unveil what types of knowledge …

被引用次数：254 相关文章所有 6 个版本

[PDF] aclanthology.org

Pretrained language models for biomedical and clinical tasks: understanding and extending the state-of-the-art

P Lewis, M Ott, J Du, V Stoyanov - Proceedings of the 3rd clinical …, 2020 - aclanthology.org

A large array of pretrained models are available to the biomedical NLP (BioNLP) community.
Finding the best model for a particular task can be difficult and time-consuming. For many …

被引用次数：239 相关文章

[PDF] arxiv.org

Minilmv2: Multi-head self-attention relation distillation for compressing pretrained transformers

W Wang, H Bao, S Huang, L Dong, F Wei - arXiv preprint arXiv …, 2020 - arxiv.org

We generalize deep self-attention distillation in MiniLM (Wang et al., 2020) by only using self-
attention relation distillation for task-agnostic compression of pretrained Transformers. In …

被引用次数：222 相关文章所有 6 个版本

[PDF] arxiv.org

Structured pruning of large language models

Z Wang, J Wohlwend, T Lei - arXiv preprint arXiv:1910.04732, 2019 - arxiv.org

Large language models have recently achieved state of the art performance across a wide
variety of natural language tasks. Meanwhile, the size of these models and their latency …

被引用次数：279 相关文章所有 3 个版本

高级搜索

QQ 群