Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org

Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

被引用次数：3 相关文章所有 3 个版本

[PDF] acm.org

Deferred continuous batching in resource-efficient large language model serving

Y He, Y Lu, G Alonso - Proceedings of the 4th Workshop on Machine …, 2024 - dl.acm.org

Despite that prior work of batched inference and parameter-efficient fine-tuning techniques
have reduced the resource requirements of large language models (LLMs), challenges …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism

B Wu, S Liu, Y Zhong, P Sun, X Liu, X Jin - arXiv preprint arXiv:2404.09526, 2024 - arxiv.org

The context window of large language models (LLMs) is rapidly increasing, leading to a
huge variance in resource usage between different requests as well as between different …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

C Lin, Z Han, C Zhang, Y Yang, F Yang… - arXiv preprint arXiv …, 2024 - arxiv.org

The rise of large language models (LLMs) has enabled LLM-based applications (aka AI
agents or co-pilots), a new software paradigm that combines the strength of LLM and …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training

J Bang, Y Choi, M Kim, Y Kim, M Rhu - arXiv preprint arXiv:2312.12391, 2023 - arxiv.org

As large language models (LLMs) become widespread in various application domains, a
critical challenge the AI community is facing is how to train these large AI models in a cost …

被引用次数：2 相关文章所有 2 个版本

[HTML] arxiv.org

MLTCP: Congestion Control for DNN Training

S Rajasekaran, S Narang, AA Zabreyko… - arXiv preprint arXiv …, 2024 - arxiv.org

We present MLTCP, a technique to augment today's congestion control algorithms to
accelerate DNN training jobs in shared GPU clusters. MLTCP enables the communication …

相关文章所有 2 个版本

[HTML] arxiv.org

DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving

Y Zhong, S Liu, J Chen, J Hu, Y Zhu, X Liu, X Jin… - arXiv preprint arXiv …, 2024 - arxiv.org

DistServe improves the performance of large language models (LLMs) serving by
disaggregating the prefill and decoding computation. Existing LLM serving systems colocate …

被引用次数：11 相关文章所有 2 个版本

[HTML] arxiv.org

Training DNN Models over Heterogeneous Clusters with Optimal Performance

C Nie, J Maghakian, Z Liu - arXiv preprint arXiv:2402.05302, 2024 - arxiv.org

Adjusting batch sizes and adaptively tuning other hyperparameters can significantly speed
up deep neural network (DNN) training. Despite the ubiquity of heterogeneous clusters …

相关文章所有 2 个版本

[PDF] arxiv.org

A Codesign of Scheduling and Parallelization for Large Model Training in Heterogeneous Clusters

C Xue, W Cui, H Zhao, Q Chen, S Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Joint consideration of scheduling and adaptive parallelism offers great opportunities for
improving the training efficiency of large models on heterogeneous GPU clusters. However …

相关文章所有 2 个版本

[PDF] arxiv.org

Asymptotically Optimal Scheduling of Multiple Parallelizable Job Classes

B Berg, B Moseley, W Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Many modern computing workloads are composed of parallelizable jobs. A single
parallelizable job can be completed more quickly if it is run on additional servers, however …

相关文章所有 2 个版本

高级搜索

QQ 群