相关文章- 学术资源搜索

Gradientflow: Optimizing network performance for large-scale distributed dnn training

P Sun, Y Wen, R Han, W Feng… - IEEE Transactions on Big …, 2019 - ieeexplore.ieee.org

It is important to scale out deep neural network (DNN) training for reducing model training
time. The high communication overhead is one of the major performance bottlenecks for …

被引用次数：29 相关文章所有 2 个版本

[PDF] arxiv.org

Optimizing network performance for distributed dnn training on gpu clusters: Imagenet/alexnet training in 1.5 minutes

P Sun, W Feng, R Han, S Yan, Y Wen - arXiv preprint arXiv:1902.06855, 2019 - arxiv.org

It is important to scale out deep neural network (DNN) training for reducing model training
time. The high communication overhead is one of the major performance bottlenecks for …

被引用次数：78 相关文章所有 2 个版本

[PDF] mlsys.org

dpro: A generic performance diagnosis and optimization toolkit for expediting distributed dnn training

H Hu, C Jiang, Y Zhong, Y Peng, C Wu… - Proceedings of …, 2022 - proceedings.mlsys.org

Distributed training using multiple devices (eg, GPUs) has been widely adopted for learning
DNN models over large datasets. However, the performance of large-scale distributed …

被引用次数：9 相关文章所有 7 个版本

[PDF] mlsys.org

Blueconnect: Decomposing all-reduce for deep learning on heterogeneous network hierarchy

M Cho, U Finkler, D Kung… - Proceedings of Machine …, 2019 - proceedings.mlsys.org

As deep neural networks get more complex and input datasets get larger, it can take days or
even weeks to train a deep neural network to the desired accuracy. Therefore, enabling …

被引用次数：109 相关文章所有 4 个版本

[PDF] arxiv.org

Parameter box: High performance parameter servers for efficient distributed deep neural network training

L Luo, J Nelson, L Ceze, A Phanishayee… - arXiv preprint arXiv …, 2018 - arxiv.org

Most work in the deep learning systems community has focused on faster inference, but
arriving at a trained model requires lengthy experiments. Accelerating training lets …

被引用次数：14 相关文章所有 4 个版本

[PDF] arxiv.org

Domain-specific communication optimization for distributed DNN training

H Wang, J Chen, X Wan, H Tian, J Xia, G Zeng… - arXiv preprint arXiv …, 2020 - arxiv.org

Communication overhead poses an important obstacle to distributed DNN training and
draws increasing attention in recent years. Despite continuous efforts, prior solutions such …

被引用次数：18 相关文章所有 5 个版本

[PDF] nsf.gov

Prophet: Speeding up distributed dnn training with predictable communication scheduling

Z Zhang, Q Qi, R Shang, L Chen, F Xu - Proceedings of the 50th …, 2021 - dl.acm.org

Optimizing performance for Distributed Deep Neural Network (DDNN) training has recently
become increasingly compelling, as the DNN model gets complex and the training dataset …

被引用次数：6 相关文章所有 4 个版本

Eflops: Algorithm and system co-design for a high performance distributed training platform

J Dong, Z Cao, T Zhang, J Ye, S Wang… - … Symposium on High …, 2020 - ieeexplore.ieee.org

Deep neural networks (DNNs) have gained tremendous attractions as compelling solutions
for applications such as image classification, object detection, speech recognition, and so …

被引用次数：50 相关文章所有 2 个版本

[PDF] nsf.gov

Nv-group: link-efficient reduction for distributed deep learning on modern dense gpu systems

CH Chu, P Kousha, AA Awan, KS Khorassani… - Proceedings of the 34th …, 2020 - dl.acm.org

The advanced fabrics like NVIDIA NVLink are enabling the deployment of dense Graphics
Processing Unit (GPU) systems such as DGX-2 and Summit. With the wide adoption of large …

被引用次数：42 相关文章所有 4 个版本

[PDF] arxiv.org

Powerai ddl

M Cho, U Finkler, S Kumar, D Kung, V Saxena… - arXiv preprint arXiv …, 2017 - arxiv.org

As deep neural networks become more complex and input datasets grow larger, it can take
days or even weeks to train a deep neural network to the desired accuracy. Therefore …

被引用次数：76 相关文章所有 4 个版本

高级搜索

QQ 群

Gradientflow: Optimizing network performance for large-scale distributed dnn training

Optimizing network performance for distributed dnn training on gpu clusters: Imagenet/alexnet training in 1.5 minutes

dpro: A generic performance diagnosis and optimization toolkit for expediting distributed dnn training

Blueconnect: Decomposing all-reduce for deep learning on heterogeneous network hierarchy

Parameter box: High performance parameter servers for efficient distributed deep neural network training

Domain-specific communication optimization for distributed DNN training

Prophet: Speeding up distributed dnn training with predictable communication scheduling

Eflops: Algorithm and system co-design for a high performance distributed training platform

Nv-group: link-efficient reduction for distributed deep learning on modern dense gpu systems

Powerai ddl

引用