查看文章

Slapo: A schedule language for progressive optimization of large deep learning model training

作者

Hongzheng Chen, Cody Hao Yu, Shuai Zheng, Zhen Zhang, Zhiru Zhang, Yida Wang

发表日期

2024/4/27

图书

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2

页码范围

1095-1111

简介

Recent years have seen an increase in the development of large deep learning (DL) models, which makes training efficiency crucial. Common practice is struggling with the trade-off between usability and performance. On one hand, DL frameworks such as PyTorch use dynamic graphs to facilitate model developers at a price of sub-optimal model training performance. On the other hand, practitioners propose various approaches to improving the training efficiency by sacrificing some of the flexibility, ranging from making the graph static for more thorough optimization (e.g., XLA) to customizing optimization towards large-scale distributed training (e.g., DeepSpeed and Megatron-LM).

In this paper, we aim to address the tension between usability and training efficiency through separation of concerns. Inspired by DL compilers that decouple the platform-specific optimizations of a tensor-level operator from its …

引用总数

被引用次数：2

20242

学术搜索中的文章

Slapo: A schedule language for progressive optimization of large deep learning model training

H Chen, CH Yu, S Zheng, Z Zhang, Z Zhang, Y Wang - Proceedings of the 29th ACM International Conference …, 2024

被引用次数：2 相关文章所有 8 个版本