[HTML][HTML] A Multi-Scale Approach to Early Fire Detection in Smart Homes

A Abdusalomov, S Umirzakova, F Safarov… - Electronics, 2024 - mdpi.com
In recent years, advancements in smart home technologies have underscored the need for
the development of early fire and smoke detection systems to enhance safety and security …

Efficient training of large language models on distributed infrastructures: A survey

J Duan, S Zhang, Z Wang, L Jiang, W Qu, Q Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with
their sophisticated capabilities. Training these models requires vast GPU clusters and …

Disttrain: Addressing model and data heterogeneity with disaggregated training for multimodal large language models

Z Zhang, Y Zhong, R Ming, H Hu, J Sun, Z Ge… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal large language models (LLMs) have demonstrated significant potential in a wide
range of AI applications. Yet, training multimodal LLMs suffers from low efficiency and …

BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching

Y Zhao, S Yang, K Zhu, L Zheng, B Kasikci… - arXiv preprint arXiv …, 2024 - arxiv.org
Offline batch inference, which leverages the flexibility of request batching to achieve higher
throughput and lower costs, is becoming more popular for latency-insensitive applications …

Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping

G Wang, C Zhang, Z Shen, A Li, O Ruwase - arXiv preprint arXiv …, 2024 - arxiv.org
Given the popularity of generative AI, Large Language Models (LLMs) often consume
hundreds or thousands of GPUs for parallelizing and accelerating the training process …

OWL: Worker-Assisted Server Bandwidth Optimization for Efficient Communication Federated Learning

X Han, B Liu, C Hu, D Cheng - Journal of Parallel and Distributed …, 2024 - Elsevier
Edge computing in federated learning based on centralized architecture often faces
communication constraints in large clusters. Although there have been some efforts like …

[PDF][PDF] Wallfacer: Guiding transformer model training out of the long-context dark forest with n-body problem

Z Liu, S Wang, S Cheng, Z Zhao… - arXiv preprint arXiv …, 2024 - maruyamaaya.github.io
In recent years, Transformer-based Large Language Models (LLMs) have garnered
significant attention due to their exceptional performance across a variety of tasks. However …

Demystifying Workload Imbalances in Large Transformer Model Training over Variable-length Sequences

H Li, F Fu, S Lin, H Ge, X Wang, J Niu, J Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org
To optimize large Transformer model training, efficient parallel computing and advanced
data management are essential. However, current methods often assume a stable and …

ProTrain: Efficient LLM Training via Memory-Aware Techniques

H Yang, J Zhou, Y Fu, X Wang, R Roane… - arXiv preprint arXiv …, 2024 - arxiv.org
It is extremely memory-hungry to train Large Language Models (LLM). To solve this problem,
existing work exploits the combination of CPU and GPU for the training process, such as …

WallFacer: Harnessing Multi-dimensional Ring Parallelism for Efficient Long Sequence Model Training

Z Liu, S Wang, S Cheng, Z Zhao, K Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Training Transformer models on long sequences in a distributed setting poses significant
challenges in terms of efficiency and scalability. Current methods are either constrained by …