Rsc: accelerate graph neural networks training via randomized sparse computations

Z Liu, C Shengyuan, K Zhou, D Zha… - International …, 2023 - proceedings.mlr.press
Training graph neural networks (GNNs) is extremely time consuming because sparse graph-
based operations are hard to be accelerated by community hardware. Prior art successfully …

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

F Liang, Z Zhang, H Lu, V Leung, Y Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
With the rapid growth in the volume of data sets, models, and devices in the domain of deep
learning, there is increasing attention on large-scale distributed deep learning. In contrast to …

Cupcake: A compression scheduler for scalable communication-efficient distributed training

Z Wang, X Wu, Z Xu, TS Ng - Proceedings of Machine …, 2023 - proceedings.mlsys.org
Data-parallel distributed training (DDT) is the de facto way to accelerate deep learning on
multiple GPUs. In DDT, communication for gradient synchronization is the major efficiency …

Gemini: Fast failure recovery in distributed training with in-memory checkpoints

Z Wang, Z Jia, S Zheng, Z Zhang, X Fu… - Proceedings of the 29th …, 2023 - dl.acm.org
Large deep learning models have recently garnered substantial attention from both
academia and industry. Nonetheless, frequent failures are observed during large model …

DIVISION: memory efficient training via dual activation precision

G Wang, Z Liu, Z Jiang, N Liu… - … on Machine Learning, 2023 - proceedings.mlr.press
Activation compressed training provides a solution towards reducing the memory cost of
training deep neural networks (DNNs). However, state-of-the-art work combines a search of …

Zen: Near-optimal sparse tensor synchronization for distributed DNN training

Z Wang, Z Xu, A Shrivastava, TS Ng - arXiv preprint arXiv:2309.13254, 2023 - arxiv.org
Distributed training is the de facto standard to scale up the training of Deep Neural Networks
(DNNs) with multiple GPUs. The performance bottleneck of distributed training lies in …

[PDF][PDF] CUPCAKE: ACOMPRESSION OPTIMIZER FOR SCALABLE COMMUNICATION-EFFICIENT DISTRIBUTED TRAINING

Z Wang, XC Wu, Z Xu, TSE Ng - Proceedings of the Sixth Conference on …, 2023 - cs.rice.edu
Data-parallel distributed training (DDT) is the de facto way to accelerate deep learning on
multiple GPUs. In DDT, communication for gradient synchronization is the major efficiency …

STRUCTDROP: A STRUCTURED RANDOM ALGORITHM TOWARDS EFFICIENT LARGE-SCALE GRAPH TRAINING

H Liu, Z Liu, K Zhou, T Zhao, N Shah, X Hu - 2023 - openreview.net
Graph neural networks (GNNs) have gained considerable success in graph-based learning
tasks, yet training GNNs on large graphs is still inefficient. The root cause is the graph-based …

[PDF][PDF] TS Eugene Ng

Z Wang - 2023 - repository.rice.edu
Deep neural networks (DNNs) have achieved unparalleled performance in numerous fields,
including computer vision, natural language processing, and recommendation systems …