Astra-sim2. 0: Modeling hierarchical networks and disaggregated systems for large-model training at scale

W Won, T Heo, S Rashidi, S Sridharan… - … Analysis of Systems …, 2023 - ieeexplore.ieee.org
As deep learning models and input data continue to scale at an unprecedented rate, it has
become inevitable to move towards distributed training platforms to fit the models and …

Darl: Distributed reconfigurable accelerator for hyperdimensional reinforcement learning

H Chen, M Issa, Y Ni, M Imani - Proceedings of the 41st IEEE/ACM …, 2022 - dl.acm.org
Reinforcement Learning (RL) is a powerful technology to solve decisionmaking problems
such as robotics control. Modern RL algorithms, ie, Deep Q-Learning, are based on costly …

Enabling compute-communication overlap in distributed deep learning training platforms

S Rashidi, M Denton, S Sridharan… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators
(eg, GPU/TPU) via fast, customized interconnects with 100s of gigabytes (GBs) of bandwidth …

Themis: A network bandwidth-aware collective scheduling policy for distributed training of dl models

S Rashidi, W Won, S Srinivasan, S Sridharan… - Proceedings of the 49th …, 2022 - dl.acm.org
Distributed training is a solution to reduce DNN training time by splitting the task across
multiple NPUs (eg, GPU/TPU). However, distributed training adds communication overhead …

Demystifying bert: System design implications

S Pati, S Aga, N Jayasena… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Transfer learning in natural language processing (NLP) uses increasingly large models that
tackle challenging problems. Consequently, these applications are driving the requirements …

Rosé: A hardware-software co-simulation infrastructure enabling pre-silicon full-stack robotics soc evaluation

D Nikiforov, SC Dong, CL Zhang, S Kim… - Proceedings of the 50th …, 2023 - dl.acm.org
Robotic systems, such as autonomous unmanned aerial vehicles (UAVs) and self-driving
cars, have been widely deployed in many scenarios and have the potential to revolutionize …

Peta-scale embedded photonics architecture for distributed deep learning applications

Z Wu, LY Dai, A Novick, M Glick, Z Zhu… - Journal of Lightwave …, 2023 - ieeexplore.ieee.org
As Deep Learning (DL) models grow larger and more complex, training jobs are
increasingly distributed across multiple Computing Units (CU) such as GPUs and TPUs …

Communication algorithm-architecture co-design for distributed deep learning

J Huang, P Majumder, S Kim, A Muzahid… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Large-scale distributed deep learning training has enabled developments of more complex
deep neural network models to learn from larger datasets for sophisticated tasks. In …

Efficient distributed inference of deep neural networks via restructuring and pruning

A Abdi, S Rashidi, F Fekri, T Krishna - Proceedings of the AAAI …, 2023 - ojs.aaai.org
In this paper, we consider the parallel implementation of an already-trained deep model on
multiple processing nodes (aka workers). Specifically, we investigate as to how a deep …

Logical/physical topology-aware collective communication in deep learning training

S Cho, H Son, J Kim - 2023 IEEE International Symposium on …, 2023 - ieeexplore.ieee.org
Training is an important aspect of deep learning to enable network models to be deployed.
To scale training, multiple GPUs are commonly used with data parallelism to exploit the …