L Lu,
Z Wang,
R Bao, M Wang, F Li, Y Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Existing pruning techniques for large language models (LLMs) targeting domain-specific
applications typically follow a two-stage process: pruning the pretrained general-purpose …