Centauri: Enabling Efficient Scheduling for Communication-Computation Overlap in Large Model...

A Abdusalomov, S Umirzakova, F Safarov… - Electronics, 2024 - mdpi.com

In recent years, advancements in smart home technologies have underscored the need for
the development of early fire and smoke detection systems to enhance safety and security …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Efficient training of large language models on distributed infrastructures: A survey

J Duan, S Zhang, Z Wang, L Jiang, W Qu, Q Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with
their sophisticated capabilities. Training these models requires vast GPU clusters and …

被引用次数：4 相关文章所有 5 个版本

[PDF] arxiv.org

Disttrain: Addressing model and data heterogeneity with disaggregated training for multimodal large language models

Z Zhang, Y Zhong, R Ming, H Hu, J Sun, Z Ge… - arXiv preprint arXiv …, 2024 - arxiv.org

Multimodal large language models (LLMs) have demonstrated significant potential in a wide
range of AI applications. Yet, training multimodal LLMs suffers from low efficiency and …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching

Y Zhao, S Yang, K Zhu, L Zheng, B Kasikci… - arXiv preprint arXiv …, 2024 - arxiv.org

Offline batch inference, which leverages the flexibility of request batching to achieve higher
throughput and lower costs, is becoming more popular for latency-insensitive applications …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping

G Wang, C Zhang, Z Shen, A Li, O Ruwase - arXiv preprint arXiv …, 2024 - arxiv.org

Given the popularity of generative AI, Large Language Models (LLMs) often consume
hundreds or thousands of GPUs for parallelizing and accelerating the training process …

被引用次数：1 相关文章所有 3 个版本

[PDF] ssrn.com

OWL: Worker-Assisted Server Bandwidth Optimization for Efficient Communication Federated Learning

X Han, B Liu, C Hu, D Cheng - Journal of Parallel and Distributed …, 2024 - Elsevier

Edge computing in federated learning based on centralized architecture often faces
communication constraints in large clusters. Although there have been some efforts like …

[PDF][PDF] Wallfacer: Guiding transformer model training out of the long-context dark forest with n-body problem

Z Liu, S Wang, S Cheng, Z Zhao… - arXiv preprint arXiv …, 2024 - maruyamaaya.github.io

In recent years, Transformer-based Large Language Models (LLMs) have garnered
significant attention due to their exceptional performance across a variety of tasks. However …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Demystifying Workload Imbalances in Large Transformer Model Training over Variable-length Sequences

H Li, F Fu, S Lin, H Ge, X Wang, J Niu, J Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org

To optimize large Transformer model training, efficient parallel computing and advanced
data management are essential. However, current methods often assume a stable and …

ProTrain: Efficient LLM Training via Memory-Aware Techniques

H Yang, J Zhou, Y Fu, X Wang, R Roane… - arXiv preprint arXiv …, 2024 - arxiv.org

It is extremely memory-hungry to train Large Language Models (LLM). To solve this problem,
existing work exploits the combination of CPU and GPU for the training process, such as …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

WallFacer: Harnessing Multi-dimensional Ring Parallelism for Efficient Long Sequence Model Training

Z Liu, S Wang, S Cheng, Z Zhao, K Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Training Transformer models on long sequences in a distributed setting poses significant
challenges in terms of efficiency and scalability. Current methods are either constrained by …

高级搜索

QQ 群