Dragonn: Distributed randomized approximate gradients of neural networks

Z Liu, C Shengyuan, K Zhou, D Zha… - International …, 2023 - proceedings.mlr.press

Training graph neural networks (GNNs) is extremely time consuming because sparse graph-
based operations are hard to be accelerated by community hardware. Prior art successfully …

被引用次数：15 相关文章所有 6 个版本

[PDF] arxiv.org

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

F Liang, Z Zhang, H Lu, V Leung, Y Guo… - arXiv preprint arXiv …, 2024 - arxiv.org

With the rapid growth in the volume of data sets, models, and devices in the domain of deep
learning, there is increasing attention on large-scale distributed deep learning. In contrast to …

被引用次数：1 相关文章所有 2 个版本

[PDF] mlsys.org

Cupcake: A compression scheduler for scalable communication-efficient distributed training

Z Wang, X Wu, Z Xu, TS Ng - Proceedings of Machine …, 2023 - proceedings.mlsys.org

Data-parallel distributed training (DDT) is the de facto way to accelerate deep learning on
multiple GPUs. In DDT, communication for gradient synchronization is the major efficiency …

被引用次数：5 相关文章

[HTML] amazon.science

Gemini: Fast failure recovery in distributed training with in-memory checkpoints

Z Wang, Z Jia, S Zheng, Z Zhang, X Fu… - Proceedings of the 29th …, 2023 - dl.acm.org

Large deep learning models have recently garnered substantial attention from both
academia and industry. Nonetheless, frequent failures are observed during large model …

被引用次数：15 相关文章所有 6 个版本

[PDF] mlr.press

DIVISION: memory efficient training via dual activation precision

G Wang, Z Liu, Z Jiang, N Liu… - … on Machine Learning, 2023 - proceedings.mlr.press

Activation compressed training provides a solution towards reducing the memory cost of
training deep neural networks (DNNs). However, state-of-the-art work combines a search of …

被引用次数：4 相关文章所有 7 个版本

[PDF] arxiv.org

Zen: Near-optimal sparse tensor synchronization for distributed DNN training

Z Wang, Z Xu, A Shrivastava, TS Ng - arXiv preprint arXiv:2309.13254, 2023 - arxiv.org

Distributed training is the de facto standard to scale up the training of Deep Neural Networks
(DNNs) with multiple GPUs. The performance bottleneck of distributed training lies in …

被引用次数：1 相关文章所有 2 个版本

[PDF] rice.edu

[PDF][PDF] CUPCAKE: ACOMPRESSION OPTIMIZER FOR SCALABLE COMMUNICATION-EFFICIENT DISTRIBUTED TRAINING

Z Wang, XC Wu, Z Xu, TSE Ng - Proceedings of the Sixth Conference on …, 2023 - cs.rice.edu

Data-parallel distributed training (DDT) is the de facto way to accelerate deep learning on
multiple GPUs. In DDT, communication for gradient synchronization is the major efficiency …

被引用次数：1 相关文章所有 3 个版本

[PDF] openreview.net

STRUCTDROP: A STRUCTURED RANDOM ALGORITHM TOWARDS EFFICIENT LARGE-SCALE GRAPH TRAINING

H Liu, Z Liu, K Zhou, T Zhao, N Shah, X Hu - 2023 - openreview.net

Graph neural networks (GNNs) have gained considerable success in graph-based learning
tasks, yet training GNNs on large graphs is still inefficient. The root cause is the graph-based …

[PDF] rice.edu

[PDF][PDF] TS Eugene Ng

Z Wang - 2023 - repository.rice.edu

Deep neural networks (DNNs) have achieved unparalleled performance in numerous fields,
including computer vision, natural language processing, and recommendation systems …

高级搜索

QQ 群