A survey on mixture of experts

W Cai, J Jiang, F Wang, J Tang, S Kim… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …

Shortened llama: A simple depth pruning for large language models

BK Kim, G Kim, TH Kim, T Castells, S Choi… - arXiv preprint arXiv …, 2024 - openreview.net
Structured pruning of modern large language models (LLMs) has emerged as a way of
decreasing their high computational needs. Width pruning reduces the size of projection …

[PDF][PDF] Large language model inference acceleration: A comprehensive hardware perspective

J Li, J Xu, S Huang, Y Chen, W Li, J Liu… - arXiv preprint arXiv …, 2024 - dai.sjtu.edu.cn
ABSTRACT Large Language Models (LLMs) have demonstrated remarkable capabilities
across various fields, from natural language understanding to text generation. Compared to …

Moma: Efficient early-fusion pre-training with mixture of modality-aware experts

XV Lin, A Shrivastava, L Luo, S Iyer, M Lewis… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce MoMa, a novel modality-aware mixture-of-experts (MoE) architecture designed
for pre-training mixed-modal, early-fusion language models. MoMa processes images and …

Data curation via joint example selection further accelerates multimodal learning

T Evans, N Parthasarathy, H Merzic… - arXiv preprint arXiv …, 2024 - arxiv.org
Data curation is an essential component of large-scale pretraining. In this work, we
demonstrate that jointly selecting batches of data is more effective for learning than selecting …

A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models

C Guo, F Cheng, Z Du, J Kiessling, J Ku, S Li… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid development of large language models (LLMs) has significantly transformed the
field of artificial intelligence, demonstrating remarkable capabilities in natural language …

Lazydit: Lazy learning for the acceleration of diffusion transformers

X Shen, Z Song, Y Zhou, B Chen, Y Li, Y Gong… - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion Transformers have emerged as the preeminent models for a wide array of
generative tasks, demonstrating superior performance and efficacy across various …

A deeper look at depth pruning of llms

SA Siddiqui, X Dong, G Heinrich, T Breuel… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) are not only resource-intensive to train but even more
costly to deploy in production. Therefore, recent work has attempted to prune blocks of LLMs …

Videollm-mod: Efficient video-language streaming with mixture-of-depths vision computation

S Wu, J Chen, KQ Lin, Q Wang, Y Gao, Q Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
A well-known dilemma in large vision-language models (eg, GPT-4, LLaVA) is that while
increasing the number of vision tokens generally enhances visual understanding, it also …

MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models

Y Luo, G Luo, J Ji, Y Zhou, X Sun, Z Shen… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite the significant progress in multimodal large language models (MLLMs), their high
computational cost remains a barrier to real-world deployment. Inspired by the mixture of …