Configurable foundation models: Building llms from a modular perspective

C Xiao, Z Zhang, C Song, D Jiang, F Yao, X Han… - arXiv preprint arXiv …, 2024 - arxiv.org
Advancements in LLMs have recently unveiled challenges tied to computational efficiency
and continual scalability due to their requirements of huge parameters, making the …

Prompt-prompted adaptive structured pruning for efficient llm generation

H Dong, B Chen, Y Chi - First Conference on Language Modeling, 2024 - openreview.net
With the development of transformer-based large language models (LLMs), they have been
applied to many fields due to their remarkable utility, but this comes at a considerable …

Prompt-prompted Mixture of Experts for Efficient LLM Generation

H Dong, B Chen, Y Chi - arXiv preprint arXiv:2404.01365, 2024 - arxiv.org
With the development of transformer-based large language models (LLMs), they have been
applied to many fields due to their remarkable utility, but this comes at a considerable …

Conditional computation in neural networks: Principles and research trends

S Scardapane, A Baiocchi, A Devoto… - Intelligenza …, 2024 - journals.sagepub.com
This article summarizes principles and ideas from the emerging area of applying conditional
computation methods to the design of neural networks. In particular, we focus on neural …

Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design

R Cai, Y Ro, GW Kim, P Wang, BE Bejnordi… - arXiv preprint arXiv …, 2024 - arxiv.org
The proliferation of large language models (LLMs) has led to the adoption of Mixture-of-
Experts (MoE) architectures that dynamically leverage specialized subnetworks for improved …

Ada-K Routing: Boosting the Efficiency of MoE-based LLMs

T Yue, L Guo, J Cheng, X Gao, J Liu - arXiv preprint arXiv:2410.10456, 2024 - arxiv.org
In the era of Large Language Models (LLMs), Mixture-of-Experts (MoE) architectures offer a
promising approach to managing computational costs while scaling up model parameters …

: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design

R Cai, Y Ro, GW Kim, P Wang, BE Bejnordi… - The Thirty-eighth Annual … - openreview.net
The proliferation of large language models (LLMs) has led to the adoption of Mixture-of-
Experts (MoE) architectures that dynamically leverage specialized subnetworks for improved …