Efficient expert pruning for sparse mixture-of-experts language models: Enhancing performance and reducing inference costs

E Liu, J Zhu, Z Lin, X Ning, MB Blaschko, S Yan… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid advancement of large language models (LLMs) has led to architectures with
billions to trillions of parameters, posing significant deployment challenges due to their …

[PDF][PDF] Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs

E Liu, J Zhu, Z Lin, X Ning, MB Blaschko, S Yan… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid advancement of large language models (LLMs) has led to architectures with
billions to trillions of parameters, posing significant deployment challenges due to their …

Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs

E Liu, J Zhu, Z Lin, X Ning, MB Blaschko, S Yan, G Dai… - CoRR, 2024 - openreview.net
The rapid advancement of large language models (LLMs) has led to architectures with
billions to trillions of parameters, posing significant deployment challenges due to their …