Performance modeling and evaluation of distributed deep learning frameworks on gpus

S Shi, Q Wang, X Chu - 2018 IEEE 16th Intl Conf on …, 2018 - ieeexplore.ieee.org
Deep learning frameworks have been widely deployed on GPU servers for deep learning
applications in both academia and industry. In training deep neural networks (DNNs), there …

Performance prediction of gpu-based deep learning applications

E Gianniti, L Zhang, D Ardagna - 2018 30th International …, 2018 - ieeexplore.ieee.org
Recent years saw an increasing success in the application of deep learning methods across
various domains and for tackling different problems, ranging from image recognition and …

Pipe-torch: Pipeline-based distributed deep learning in a gpu cluster with heterogeneous networking

J Zhan, J Zhang - … Conference on Advanced Cloud and Big …, 2019 - ieeexplore.ieee.org
Because training a deep neural network (DNN) takes arduous amounts of time and
computation, often researchers expedite the training process via distributed parallel training …

Poseidon: An efficient communication architecture for distributed deep learning on {GPU} clusters

H Zhang, Z Zheng, S Xu, W Dai, Q Ho, X Liang… - 2017 USENIX Annual …, 2017 - usenix.org
Deep learning models can take weeks to train on a single GPU-equipped machine,
necessitating scaling out DL training to a GPU-cluster. However, current distributed DL …

Performance comparision of tpu, gpu, cpu on google colaboratory over distributed deep learning

H Kimm, I Paik, H Kimm - … Many-core Systems-on-Chip (MCSoC …, 2021 - ieeexplore.ieee.org
Deep Learning models need massive amounts compute powers and tend to improve
performance running on special purpose processors accelerators designed to speed up …

Performance analysis of GPU-based convolutional neural networks

X Li, G Zhang, HH Huang, Z Wang… - 2016 45th International …, 2016 - ieeexplore.ieee.org
As one of the most important deep learning models, convolutional neural networks (CNNs)
have achieved great successes in a number of applications such as image classification …

Performance characterization of dnn training using tensorflow and pytorch on modern clusters

A Jain, AA Awan, Q Anthony… - 2019 IEEE …, 2019 - ieeexplore.ieee.org
The recent surge of Deep Learning (DL) models and applications can be attributed to the
rise in computational resources, availability of large-scale datasets, and accessible DL …

Profiling dnn workloads on a volta-based dgx-1 system

SA Mojumder, MS Louis, Y Sun… - 2018 IEEE …, 2018 - ieeexplore.ieee.org
High performance multi-GPU systems are widely used to accelerate training of deep neural
networks (DNNs) by exploiting the inherently massive parallel nature of the training process …

Benchmarking state-of-the-art deep learning software tools

S Shi, Q Wang, P Xu, X Chu - 2016 7th International …, 2016 - ieeexplore.ieee.org
Deep learning has been shown as a successful machine learning method for a variety of
tasks, and its popularity results in numerous open-source deep learning software tools …

Benchmarking gpu-accelerated edge devices

J Jo, S Jeong, P Kang - … conference on big data and smart …, 2020 - ieeexplore.ieee.org
We evaluate one of the state-of-the-art GPU-accelerated edge devices in this paper. We
perform a set of deep learning benchmarks on the device to measure its performance. By …