Demystifying tensorrt: Characterizing neural network inference engine on nvidia edge devices

O Shafi, C Rai, R Sen… - 2021 IEEE International …, 2021 - ieeexplore.ieee.org
Edge devices are seeing tremendous growth in sensing and computational capabilities.
Running state-of-the-art deep neural network (NN) based data processing on multi-core …

Deep learning at scale on nvidia v100 accelerators

R Xu, F Han, Q Ta - 2018 IEEE/ACM Performance Modeling …, 2018 - ieeexplore.ieee.org
The recent explosion in the popularity of Deep Learning (DL) is due to a combination of
improved algorithms, access to large datasets and increased computational power. This had …

Habitat: A {Runtime-Based} computational performance predictor for deep neural network training

XY Geoffrey, Y Gao, P Golikov… - 2021 USENIX Annual …, 2021 - usenix.org
Deep learning researchers and practitioners usually leverage GPUs to help train their deep
neural networks (DNNs) faster. However, choosing which GPU to use is challenging both …

Performance characterization of dnn training using tensorflow and pytorch on modern clusters

A Jain, AA Awan, Q Anthony… - 2019 IEEE …, 2019 - ieeexplore.ieee.org
The recent surge of Deep Learning (DL) models and applications can be attributed to the
rise in computational resources, availability of large-scale datasets, and accessible DL …

DeepEdgeBench: Benchmarking deep neural networks on edge devices

SP Baller, A Jindal, M Chadha… - 2021 IEEE International …, 2021 - ieeexplore.ieee.org
EdgeAI (Edge computing based Artificial Intelligence) has been most actively researched for
the last few years to handle variety of massively distributed AI applications to meet up the …

Benchmarking TPU, GPU, and CPU platforms for deep learning

YE Wang, GY Wei, D Brooks - arXiv preprint arXiv:1907.10701, 2019 - arxiv.org
Training deep learning models is compute-intensive and there is an industry-wide trend
towards hardware specialization to improve performance. To systematically benchmark …

Characterizing the performance of accelerated jetson edge devices for training deep learning models

P SK, SA Kesanapalli, Y Simmhan - … of the ACM on Measurement and …, 2022 - dl.acm.org
Deep Neural Networks (DNNs) have had a significant impact on domains like autonomous
vehicles and smart cities through low-latency inferencing on edge computing devices close …

Automating deep neural network model selection for edge inference

B Lu, J Yang, LY Chen, S Ren - 2019 IEEE First International …, 2019 - ieeexplore.ieee.org
The ever increasing size of deep neural network (DNN) models once implied that they were
only limited to cloud data centers for runtime inference. Nonetheless, the recent plethora of …

An in-depth performance characterization of CPU-and GPU-based DNN training on modern architectures

AA Awan, H Subramoni, DK Panda - … of the Machine Learning on HPC …, 2017 - dl.acm.org
Traditionally, Deep Learning (DL) frameworks like Caffe, TensorFlow, and Cognitive Toolkit
exploited GPUs to accelerate the training process. This has been primarily achieved by …

Automated runtime-aware scheduling for multi-tenant dnn inference on gpu

F Yu, S Bray, D Wang, L Shangguan… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
With the fast development of deep neural networks (DNNs), many real-world applications
are adopting multiple models to conduct compound tasks, such as co-running classification …