Learn to be efficient: Build structured sparsity in large language models

H Zheng, X Bai, B Chen, F Lai, A Prakash - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have achieved remarkable success with their billion-level
parameters, yet they incur high inference overheads. The emergence of activation sparsity in …

CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models

JY Lee, D Lee, G Zhang, M Tiwari… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have dramatically advanced AI applications, yet their
deployment remains challenging due to their immense inference costs. Recent studies …

Sparsity-Accelerated Training for Large Language Models

D Ma, L Chen, P Wang, H Xu, H Li, L Sun, S Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have demonstrated proficiency across various natural
language processing (NLP) tasks but often require additional training, such as continual pre …

Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters

Y Song, H Xie, Z Zhang, B Wen, L Ma, Z Mi… - arXiv preprint arXiv …, 2024 - arxiv.org
Exploiting activation sparsity is a promising approach to significantly accelerating the
inference process of large language models (LLMs) without compromising performance …

Relu strikes back: Exploiting activation sparsity in large language models

I Mirzadeh, K Alizadeh, S Mehta, CC Del Mundo… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) with billions of parameters have drastically transformed AI
applications. However, their demanding computation during inference has raised significant …

ReLU Wins: Discovering Efficient Activation Functions for Sparse LLMs

Z Zhang, Y Song, G Yu, X Han, Y Lin, C Xiao… - arXiv preprint arXiv …, 2024 - arxiv.org
Sparse computation offers a compelling solution for the inference of Large Language
Models (LLMs) in low-resource scenarios by dynamically skipping the computation of …

Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

H Wang, S Ma, R Wang, F Wei - arXiv preprint arXiv:2407.10969, 2024 - arxiv.org
We introduce, Q-Sparse, a simple yet effective approach to training sparsely-activated large
language models (LLMs). Q-Sparse enables full sparsity of activations in LLMs which can …

ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models

C Song, X Han, Z Zhang, S Hu, X Shi, K Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Activation sparsity refers to the existence of considerable weakly-contributed elements
among activation outputs. As a prevalent property of the models using the ReLU activation …

Achieving Sparse Activation in Small Language Models

J Song, K Huang, X Yin, B Yang, W Gao - arXiv preprint arXiv:2406.06562, 2024 - arxiv.org
Sparse activation, which selectively activates only an input-dependent set of neurons in
inference, is a useful technique to reduce the computing cost of Large Language Models …

SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models

X Lu, A Zhou, Y Xu, R Zhang, P Gao, H Li - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have become pivotal in advancing the field of artificial
intelligence, yet their immense sizes pose significant challenges for both fine-tuning and …