Enabling compute-communication overlap in distributed deep learning training platforms

R Parizotto, BL Coelho, DC Nunes, I Haque… - ACM Computing …, 2023 - dl.acm.org

The demand for machine learning (ML) has increased significantly in recent decades,
enabling several applications, such as speech recognition, computer vision, and …

被引用次数：12 相关文章所有 3 个版本

[PDF] acm.org

Overlap communication with dependent computation via decomposition in large deep learning models

S Wang, J Wei, A Sabne, A Davis, B Ilbeyi… - Proceedings of the 28th …, 2022 - dl.acm.org

Large deep learning models have shown great potential with state-of-the-art results in many
tasks. However, running these large models is quite challenging on an accelerator (GPU or …

被引用次数：36 相关文章所有 2 个版本

[PDF] arxiv.org

Galvatron: Efficient transformer training over multiple gpus using automatic parallelism

X Miao, Y Wang, Y Jiang, C Shi, X Nie, H Zhang… - arXiv preprint arXiv …, 2022 - arxiv.org

Transformer models have achieved state-of-the-art performance on various domains of
applications and gradually becomes the foundations of the advanced large deep learning …

被引用次数：36 相关文章所有 5 个版本

[PDF] arxiv.org

Astra-sim2. 0: Modeling hierarchical networks and disaggregated systems for large-model training at scale

W Won, T Heo, S Rashidi, S Sridharan… - … Analysis of Systems …, 2023 - ieeexplore.ieee.org

As deep learning models and input data continue to scale at an unprecedented rate, it has
become inevitable to move towards distributed training platforms to fit the models and …

被引用次数：22 相关文章所有 4 个版本

[PDF] acm.org

Congestion control in machine learning clusters

S Rajasekaran, M Ghobadi, G Kumar… - Proceedings of the 21st …, 2022 - dl.acm.org

This paper argues that fair-sharing, the holy grail of congestion control algorithms for
decades, is not necessarily a desirable property in Machine Learning (ML) training clusters …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

HammingMesh: a network topology for large-scale deep learning

T Hoefler, T Bonato, D De Sensi… - … Conference for High …, 2022 - ieeexplore.ieee.org

Numerous microarchitectural optimizations unlocked tremendous processing power for
deep neural networks that in turn fueled the AI revolution. With the exhaustion of such …

被引用次数：14 相关文章所有 25 个版本

[PDF] acm.org

Themis: A network bandwidth-aware collective scheduling policy for distributed training of dl models

S Rashidi, W Won, S Srinivasan, S Sridharan… - Proceedings of the 49th …, 2022 - dl.acm.org

Distributed training is a solution to reduce DNN training time by splitting the task across
multiple NPUs (eg, GPU/TPU). However, distributed training adds communication overhead …

被引用次数：26 相关文章所有 7 个版本

[PDF] ieee.org

Peta-scale embedded photonics architecture for distributed deep learning applications

Z Wu, LY Dai, A Novick, M Glick, Z Zhu… - Journal of Lightwave …, 2023 - ieeexplore.ieee.org

As Deep Learning (DL) models grow larger and more complex, training jobs are
increasingly distributed across multiple Computing Units (CU) such as GPUs and TPUs …

被引用次数：9 相关文章所有 8 个版本

[PDF] nsf.gov

Codg-reram: An algorithm-hardware co-design to accelerate semi-structured gnns on reram

Y Luo, P Behnam, K Thorat, Z Liu… - 2022 IEEE 40th …, 2022 - ieeexplore.ieee.org

Graph Neural Networks (GCNs) have attracted wide attention and are applied to the real
world. However, due to the ever-growing graph data with significant irregularities, off-chip …

被引用次数：12 相关文章所有 4 个版本

Logical/physical topology-aware collective communication in deep learning training

S Cho, H Son, J Kim - 2023 IEEE International Symposium on …, 2023 - ieeexplore.ieee.org

Training is an important aspect of deep learning to enable network models to be deployed.
To scale training, multiple GPUs are commonly used with data parallelism to exploit the …

被引用次数：6 相关文章所有 4 个版本

高级搜索

QQ 群