SiDA: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts... - 学术资源搜索

文章

学术资源搜索

获得 7 条结果（用时0.04秒）

我的图书馆

SiDA: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts...

在引用文章中搜索

[PDF] arxiv.org

Llm inference serving: Survey of recent advances and opportunities

B Li, Y Jiang, V Gadepally, D Tiwari - arXiv preprint arXiv:2407.12391, 2024 - arxiv.org

This survey offers a comprehensive overview of recent advancements in Large Language
Model (LLM) serving systems, focusing on research since the year 2023. We specifically …

被引用次数：10 相关文章所有 2 个版本

[PDF] arxiv.org

A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models

C Guo, F Cheng, Z Du, J Kiessling, J Ku, S Li… - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid development of large language models (LLMs) has significantly transformed the
field of artificial intelligence, demonstrating remarkable capabilities in natural language …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Expertflow: Optimized expert activation and token allocation for efficient mixture-of-experts inference

X He, S Zhang, Y Wang, H Yin, Z Zeng, S Shi… - arXiv preprint arXiv …, 2024 - arxiv.org

Sparse Mixture of Experts (MoE) models, while outperforming dense Large Language
Models (LLMs) in terms of performance, face significant deployment challenges during …

被引用次数：2 相关文章所有 2 个版本

APTMoE: Affinity-Aware Pipeline Tuning for MoE Models on Bandwidth-Constrained GPU Nodes

Y Wei, J Du, J Jiang, X Shi, X Zhang… - … Conference for High …, 2024 - ieeexplore.ieee.org

Recently, the sparsely-gated Mixture-Of-Experts (MoE) architecture has garnered significant
attention. To benefit a wider audience, fine-tuning MoE models on more affordable clusters …

相关文章所有 3 个版本

[PDF] arxiv.org

A Survey on Inference Optimization Techniques for Mixture of Experts Models

J Liu, P Tang, W Wang, Y Ren, X Hou, PA Heng… - arXiv preprint arXiv …, 2024 - arxiv.org

The emergence of large-scale Mixture of Experts (MoE) models has marked a significant
advancement in artificial intelligence, offering enhanced model capacity and computational …

相关文章所有 4 个版本

[PDF] github.io

Special Session: Neuro-Symbolic Architecture Meets Large Language Models: A Memory-Centric Perspective

M Ibrahim, Z Wan, H Li, P Panda… - 2024 International …, 2024 - ieeexplore.ieee.org

Large language models (LLMs) have significantly transformed the landscape of artificial
intelligence, demonstrating exceptional capabilities in natural language understanding and …

相关文章所有 4 个版本

[PDF] arxiv.org

DAOP: Data-Aware Offloading and Predictive Pre-Calculation for Efficient MoE Inference

Y Zhang, S Aggarwal, T Mitra - arXiv preprint arXiv:2501.10375, 2024 - arxiv.org

Mixture-of-Experts (MoE) models, though highly effective for various machine learning tasks,
face significant deployment challenges on memory-constrained devices. While GPUs offer …