A simple and effective pruning approach for large language models

M Sun, Z Liu, A Bair, JZ Kolter - arXiv preprint arXiv:2306.11695, 2023 - arxiv.org
As their size increases, Large Languages Models (LLMs) are natural candidates for network
pruning methods: approaches that drop a subset of network weights while striving to …

Learning best combination for efficient n: M sparsity

Y Zhang, M Lin, Z Lin, Y Luo, K Li… - Advances in Neural …, 2022 - proceedings.neurips.cc
By forcing N out of M consecutive weights to be non-zero, the recent N: M fine-grained
network sparsity has received increasing attention with its two attractive advantages over …

1xn pattern for pruning convolutional neural networks

M Lin, Y Zhang, Y Li, B Chen, F Chao… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org
Though network pruning receives popularity in reducing the complexity of convolutional
neural networks (CNNs), it remains an open issue to concurrently maintain model accuracy …

Rgp: Neural network pruning through regular graph with edges swapping

Z Chen, J Xiang, Y Lu, Q Xuan, Z Wang… - … on Neural Networks …, 2023 - ieeexplore.ieee.org
Deep learning technology has found a promising application in lightweight model design, for
which pruning is an effective means of achieving a large reduction in both model parameters …

Bi-directional masks for efficient n: M sparse training

Y Zhang, Y Luo, M Lin, Y Zhong, J Xie… - … on machine learning, 2023 - proceedings.mlr.press
We focus on addressing the dense backward propagation issue for training efficiency of N:
M fine-grained sparsity that preserves at most N out of M consecutive weights and achieves …

Dynamic sparsity is channel-level sparsity learner

L Yin, G Li, M Fang, L Shen, T Huang… - Advances in …, 2024 - proceedings.neurips.cc
Sparse training has received an upsurging interest in machine learning due to its tantalizing
saving potential for both the entire training process as well as the inference. Dynamic sparse …

STEP: learning N: M structured sparsity masks from scratch with precondition

Y Lu, S Agrawal, S Subramanian… - International …, 2023 - proceedings.mlr.press
Recent innovations on hardware (eg Nvidia A100) have motivated learning N: M structured
sparsity masks from scratch for fast model inference. However, state-of-the-art learning …

Updp: A unified progressive depth pruner for cnn and vision transformer

J Liu, D Tang, Y Huang, L Zhang, X Zeng, D Li… - Proceedings of the …, 2024 - ojs.aaai.org
Traditional channel-wise pruning methods by reducing network channels struggle to
effectively prune efficient CNN models with depth-wise convolutional layers and certain …

Dimensionality reduced training by pruning and freezing parts of a deep neural network: a survey

P Wimmer, J Mehnert, AP Condurache - Artificial Intelligence Review, 2023 - Springer
State-of-the-art deep learning models have a parameter count that reaches into the billions.
Training, storing and transferring such models is energy and time consuming, thus costly. A …

Compresso: Structured pruning with collaborative prompting learns compact large language models

S Guo, J Xu, LL Zhang, M Yang - arXiv preprint arXiv:2310.05015, 2023 - arxiv.org
Despite the remarkable success of Large Language Models (LLMs), the massive size poses
significant deployment challenges, particularly on resource-constrained hardware. While …