Ladabert: Lightweight adaptation of bert through hybrid model compression

H Cheng, M Zhang, JQ Shi - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …

被引用次数：104 相关文章所有 2 个版本

[HTML] frontiersin.org

[HTML][HTML] Applications and techniques for fast machine learning in science

AMC Deiana, N Tran, J Agar, M Blott… - Frontiers in big …, 2022 - frontiersin.org

In this community review report, we discuss applications and techniques for fast machine
learning (ML) in science—the concept of integrating powerful ML methods into the real-time …

被引用次数：64 相关文章所有 27 个版本

[PDF] neurips.cc

Zeroquant: Efficient and affordable post-training quantization for large-scale transformers

Z Yao, R Yazdani Aminabadi… - Advances in …, 2022 - proceedings.neurips.cc

How to efficiently serve ever-larger trained natural language models in practice has become
exceptionally challenging even for powerful cloud servers due to their prohibitive …

被引用次数：365 相关文章所有 7 个版本

[PDF] arxiv.org

It's not just size that matters: Small language models are also few-shot learners

T Schick, H Schütze - arXiv preprint arXiv:2009.07118, 2020 - arxiv.org

When scaled to hundreds of billions of parameters, pretrained language models such as
GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance. However, enormous …

被引用次数：965 相关文章所有 5 个版本

[PDF] mlr.press

I-bert: Integer-only bert quantization

S Kim, A Gholami, Z Yao… - … on machine learning, 2021 - proceedings.mlr.press

Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results
in many Natural Language Processing tasks. However, their memory footprint, inference …

被引用次数：369 相关文章所有 7 个版本

[PDF] arxiv.org

Block pruning for faster transformers

F Lagunas, E Charlaix, V Sanh, AM Rush - arXiv preprint arXiv …, 2021 - arxiv.org

Pre-training has improved model accuracy for both classification and generation tasks at the
cost of introducing much larger and slower models. Pruning methods have proven to be an …

被引用次数：219 相关文章所有 5 个版本

[PDF] arxiv.org

Ternarybert: Distillation-aware ultra-low bit bert

W Zhang, L Hou, Y Yin, L Shang, X Chen… - arXiv preprint arXiv …, 2020 - arxiv.org

Transformer-based pre-training models like BERT have achieved remarkable performance
in many natural language processing tasks. However, these models are both computation …

被引用次数：213 相关文章所有 3 个版本

[PDF] arxiv.org

Utility is in the eye of the user: A critique of NLP leaderboards

K Ethayarajh, D Jurafsky - arXiv preprint arXiv:2009.13888, 2020 - arxiv.org

Benchmarks such as GLUE have helped drive advances in NLP by incentivizing the creation
of more accurate models. While this leaderboard paradigm has been remarkably successful …

被引用次数：174 相关文章所有 8 个版本

[PDF] mit.edu

Compressing large-scale transformer-based models: A case study on bert

P Ganesh, Y Chen, X Lou, MA Khan, Y Yang… - Transactions of the …, 2021 - direct.mit.edu

Pre-trained Transformer-based models have achieved state-of-the-art performance for
various Natural Language Processing (NLP) tasks. However, these models often have …

被引用次数：219 相关文章所有 14 个版本

[PDF] neurips.cc

Drone: Data-aware low-rank compression for large nlp models

P Chen, HF Yu, I Dhillon… - Advances in neural …, 2021 - proceedings.neurips.cc

The representations learned by large-scale NLP models such as BERT have been widely
used in various tasks. However, the increasing model size of the pre-trained models also …

被引用次数：56 相关文章所有 7 个版本

高级搜索

QQ 群