Astra-sim: Enabling sw/hw co-design exploration for distributed dl training platforms

Astra-sim2. 0: Modeling hierarchical networks and disaggregated systems for large-model training at scale

W Won, T Heo, S Rashidi, S Sridharan… - … Analysis of Systems …, 2023 - ieeexplore.ieee.org

As deep learning models and input data continue to scale at an unprecedented rate, it has
become inevitable to move towards distributed training platforms to fit the models and …

被引用次数：22 相关文章所有 4 个版本

[PDF] nsf.gov

Darl: Distributed reconfigurable accelerator for hyperdimensional reinforcement learning

H Chen, M Issa, Y Ni, M Imani - Proceedings of the 41st IEEE/ACM …, 2022 - dl.acm.org

Reinforcement Learning (RL) is a powerful technology to solve decisionmaking problems
such as robotics control. Modern RL algorithms, ie, Deep Q-Learning, are based on costly …

被引用次数：19 相关文章所有 5 个版本

[PDF] arxiv.org

Enabling compute-communication overlap in distributed deep learning training platforms

S Rashidi, M Denton, S Sridharan… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org

Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators
(eg, GPU/TPU) via fast, customized interconnects with 100s of gigabytes (GBs) of bandwidth …

被引用次数：38 相关文章所有 6 个版本

[PDF] acm.org

Themis: A network bandwidth-aware collective scheduling policy for distributed training of dl models

S Rashidi, W Won, S Srinivasan, S Sridharan… - Proceedings of the 49th …, 2022 - dl.acm.org

Distributed training is a solution to reduce DNN training time by splitting the task across
multiple NPUs (eg, GPU/TPU). However, distributed training adds communication overhead …

被引用次数：26 相关文章所有 7 个版本

Demystifying bert: System design implications

S Pati, S Aga, N Jayasena… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org

Transfer learning in natural language processing (NLP) uses increasingly large models that
tackle challenging problems. Consequently, these applications are driving the requirements …

被引用次数：15 相关文章所有 2 个版本

[PDF] acm.org

Rosé: A hardware-software co-simulation infrastructure enabling pre-silicon full-stack robotics soc evaluation

D Nikiforov, SC Dong, CL Zhang, S Kim… - Proceedings of the 50th …, 2023 - dl.acm.org

Robotic systems, such as autonomous unmanned aerial vehicles (UAVs) and self-driving
cars, have been widely deployed in many scenarios and have the potential to revolutionize …

被引用次数：7 相关文章所有 4 个版本

[PDF] ieee.org

Peta-scale embedded photonics architecture for distributed deep learning applications

Z Wu, LY Dai, A Novick, M Glick, Z Zhu… - Journal of Lightwave …, 2023 - ieeexplore.ieee.org

As Deep Learning (DL) models grow larger and more complex, training jobs are
increasingly distributed across multiple Computing Units (CU) such as GPUs and TPUs …

被引用次数：9 相关文章所有 8 个版本

[PDF] nsf.gov

Communication algorithm-architecture co-design for distributed deep learning

J Huang, P Majumder, S Kim, A Muzahid… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org

Large-scale distributed deep learning training has enabled developments of more complex
deep neural network models to learn from larger datasets for sophisticated tasks. In …

被引用次数：23 相关文章所有 8 个版本

[PDF] aaai.org

Efficient distributed inference of deep neural networks via restructuring and pruning

A Abdi, S Rashidi, F Fekri, T Krishna - Proceedings of the AAAI …, 2023 - ojs.aaai.org

In this paper, we consider the parallel implementation of an already-trained deep model on
multiple processing nodes (aka workers). Specifically, we investigate as to how a deep …

被引用次数：2 相关文章所有 3 个版本

Logical/physical topology-aware collective communication in deep learning training

S Cho, H Son, J Kim - 2023 IEEE International Symposium on …, 2023 - ieeexplore.ieee.org

Training is an important aspect of deep learning to enable network models to be deployed.
To scale training, multiple GPUs are commonly used with data parallelism to exploit the …

被引用次数：6 相关文章所有 4 个版本

高级搜索

QQ 群