Dive into deep learning

A Zhang, ZC Lipton, M Li, AJ Smola - arXiv preprint arXiv:2106.11342, 2021 - arxiv.org
This open-source book represents our attempt to make deep learning approachable,
teaching readers the concepts, the context, and the code. The entire book is drafted in …

Blink: Fast and generic collectives for distributed ml

G Wang, S Venkataraman… - Proceedings of …, 2020 - proceedings.mlsys.org
Abstract Model parameter synchronization across GPUs introduces high overheads for data-
parallel training at scale. Existing parameter synchronization protocols cannot effectively …

Parameter hub: a rack-scale parameter server for distributed deep neural network training

L Luo, J Nelson, L Ceze, A Phanishayee… - Proceedings of the …, 2018 - dl.acm.org
Distributed deep neural network (DDNN) training constitutes an increasingly important
workload that frequently runs in the cloud. Larger DNN models and faster compute engines …

Communication algorithm-architecture co-design for distributed deep learning

J Huang, P Majumder, S Kim, A Muzahid… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Large-scale distributed deep learning training has enabled developments of more complex
deep neural network models to learn from larger datasets for sophisticated tasks. In …

[引用][C] Efficient Interconnection Network Design for Heterogeneous Architectures

J Huang - 2020

[引用][C] Tree-based allreduce communication on mxnet

C Yang, AWS Amazon - Tech. Rep., 2018