Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

W Cai, J Jiang, F Wang, J Tang, S Kim… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …

被引用次数：53 相关文章所有 4 个版本

[PDF] openreview.net

Shortened llama: A simple depth pruning for large language models

BK Kim, G Kim, TH Kim, T Castells, S Choi… - arXiv preprint arXiv …, 2024 - openreview.net

Structured pruning of modern large language models (LLMs) has emerged as a way of
decreasing their high computational needs. Width pruning reduces the size of projection …

被引用次数：38 相关文章所有 2 个版本

[PDF] sjtu.edu.cn

[PDF][PDF] Large language model inference acceleration: A comprehensive hardware perspective

J Li, J Xu, S Huang, Y Chen, W Li, J Liu… - arXiv preprint arXiv …, 2024 - dai.sjtu.edu.cn

ABSTRACT Large Language Models (LLMs) have demonstrated remarkable capabilities
across various fields, from natural language understanding to text generation. Compared to …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

Moma: Efficient early-fusion pre-training with mixture of modality-aware experts

XV Lin, A Shrivastava, L Luo, S Iyer, M Lewis… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce MoMa, a novel modality-aware mixture-of-experts (MoE) architecture designed
for pre-training mixed-modal, early-fusion language models. MoMa processes images and …

被引用次数：14 相关文章所有 4 个版本

[PDF] arxiv.org

Data curation via joint example selection further accelerates multimodal learning

T Evans, N Parthasarathy, H Merzic… - arXiv preprint arXiv …, 2024 - arxiv.org

Data curation is an essential component of large-scale pretraining. In this work, we
demonstrate that jointly selecting batches of data is more effective for learning than selecting …

被引用次数：12 相关文章所有 5 个版本

[PDF] arxiv.org

A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models

C Guo, F Cheng, Z Du, J Kiessling, J Ku, S Li… - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid development of large language models (LLMs) has significantly transformed the
field of artificial intelligence, demonstrating remarkable capabilities in natural language …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Lazydit: Lazy learning for the acceleration of diffusion transformers

X Shen, Z Song, Y Zhou, B Chen, Y Li, Y Gong… - arXiv preprint arXiv …, 2024 - arxiv.org

Diffusion Transformers have emerged as the preeminent models for a wide array of
generative tasks, demonstrating superior performance and efficacy across various …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

A deeper look at depth pruning of llms

SA Siddiqui, X Dong, G Heinrich, T Breuel… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) are not only resource-intensive to train but even more
costly to deploy in production. Therefore, recent work has attempted to prune blocks of LLMs …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Videollm-mod: Efficient video-language streaming with mixture-of-depths vision computation

S Wu, J Chen, KQ Lin, Q Wang, Y Gao, Q Xu… - arXiv preprint arXiv …, 2024 - arxiv.org

A well-known dilemma in large vision-language models (eg, GPT-4, LLaVA) is that while
increasing the number of vision tokens generally enhances visual understanding, it also …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models

Y Luo, G Luo, J Ji, Y Zhou, X Sun, Z Shen… - arXiv preprint arXiv …, 2024 - arxiv.org

Despite the significant progress in multimodal large language models (MLLMs), their high
computational cost remains a barrier to real-world deployment. Inspired by the mixture of …

被引用次数：2 相关文章所有 2 个版本

高级搜索

QQ 群