A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations

H Cheng, M Zhang, JQ Shi - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …

[HTML][HTML] Applications and techniques for fast machine learning in science

AMC Deiana, N Tran, J Agar, M Blott… - Frontiers in big …, 2022 - frontiersin.org
In this community review report, we discuss applications and techniques for fast machine
learning (ML) in science—the concept of integrating powerful ML methods into the real-time …

Zeroquant: Efficient and affordable post-training quantization for large-scale transformers

Z Yao, R Yazdani Aminabadi… - Advances in …, 2022 - proceedings.neurips.cc
How to efficiently serve ever-larger trained natural language models in practice has become
exceptionally challenging even for powerful cloud servers due to their prohibitive …

It's not just size that matters: Small language models are also few-shot learners

T Schick, H Schütze - arXiv preprint arXiv:2009.07118, 2020 - arxiv.org
When scaled to hundreds of billions of parameters, pretrained language models such as
GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance. However, enormous …

I-bert: Integer-only bert quantization

S Kim, A Gholami, Z Yao… - … on machine learning, 2021 - proceedings.mlr.press
Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results
in many Natural Language Processing tasks. However, their memory footprint, inference …

Block pruning for faster transformers

F Lagunas, E Charlaix, V Sanh, AM Rush - arXiv preprint arXiv …, 2021 - arxiv.org
Pre-training has improved model accuracy for both classification and generation tasks at the
cost of introducing much larger and slower models. Pruning methods have proven to be an …

Ternarybert: Distillation-aware ultra-low bit bert

W Zhang, L Hou, Y Yin, L Shang, X Chen… - arXiv preprint arXiv …, 2020 - arxiv.org
Transformer-based pre-training models like BERT have achieved remarkable performance
in many natural language processing tasks. However, these models are both computation …

Utility is in the eye of the user: A critique of NLP leaderboards

K Ethayarajh, D Jurafsky - arXiv preprint arXiv:2009.13888, 2020 - arxiv.org
Benchmarks such as GLUE have helped drive advances in NLP by incentivizing the creation
of more accurate models. While this leaderboard paradigm has been remarkably successful …

Compressing large-scale transformer-based models: A case study on bert

P Ganesh, Y Chen, X Lou, MA Khan, Y Yang… - Transactions of the …, 2021 - direct.mit.edu
Pre-trained Transformer-based models have achieved state-of-the-art performance for
various Natural Language Processing (NLP) tasks. However, these models often have …

Drone: Data-aware low-rank compression for large nlp models

P Chen, HF Yu, I Dhillon… - Advances in neural …, 2021 - proceedings.neurips.cc
The representations learned by large-scale NLP models such as BERT have been widely
used in various tasks. However, the increasing model size of the pre-trained models also …