Y Song, H Xie, Z Zhang, B Wen, L Ma,
Z Mi… - arXiv preprint arXiv …, 2024 - arxiv.org
Exploiting activation sparsity is a promising approach to significantly accelerating the
inference process of large language models (LLMs) without compromising performance …