Tbd: Benchmarking and analyzing deep neural network training

M Mazumder, C Banbury, X Yao… - Advances in …, 2023 - proceedings.neurips.cc

Abstract Machine learning research has long focused on models rather than datasets, and
prominent datasets are used for common ML tasks without regard to the breadth, difficulty …

被引用次数：132 相关文章所有 6 个版本

[PDF] arxiv.org

A survey of on-device machine learning: An algorithms and learning theory perspective

S Dhar, J Guo, J Liu, S Tripathi, U Kurup… - ACM Transactions on …, 2021 - dl.acm.org

The predominant paradigm for using machine learning models on a device is to train a
model in the cloud and perform inference using the trained model on the device. However …

被引用次数：206 相关文章所有 3 个版本

[PDF] mlsys.org

Priority-based parameter propagation for distributed DNN training

A Jayarajan, J Wei, G Gibson… - Proceedings of …, 2019 - proceedings.mlsys.org

Data parallel training is widely used for scaling distributed deep neural network (DNN)
training. However, the performance benefits are often limited by the communication-heavy …

被引用次数：194 相关文章所有 9 个版本

[PDF] toronto.edu

Gist: Efficient data encoding for deep neural network training

A Jain, A Phanishayee, J Mars, L Tang… - 2018 ACM/IEEE 45th …, 2018 - ieeexplore.ieee.org

Modern deep neural networks (DNNs) training typically relies on GPUs to train complex
hundred-layer deep networks. A significant problem facing both researchers and industry …

被引用次数：207 相关文章所有 8 个版本

[PDF] arxiv.org

Analysis of dawnbench, a time-to-accuracy machine learning performance benchmark

C Coleman, D Kang, D Narayanan, L Nardi… - ACM SIGOPS …, 2019 - dl.acm.org

Researchers have proposed hardware, software, and algorithmic optimizations to improve
the computational performance of deep learning. While some of these optimizations perform …

被引用次数：147 相关文章所有 12 个版本

[PDF] acm.org

Parameter hub: a rack-scale parameter server for distributed deep neural network training

L Luo, J Nelson, L Ceze, A Phanishayee… - Proceedings of the …, 2018 - dl.acm.org

Distributed deep neural network (DDNN) training constitutes an increasingly important
workload that frequently runs in the cloud. Larger DNN models and faster compute engines …

被引用次数：153 相关文章所有 6 个版本

[PDF] springer.com

DLBench: a comprehensive experimental evaluation of deep learning frameworks

R Elshawi, A Wahab, A Barnawi, S Sakr - Cluster Computing, 2021 - Springer

Deep Learning (DL) has achieved remarkable progress over the last decade on various
tasks such as image recognition, speech recognition, and natural language processing. In …

被引用次数：65 相关文章所有 5 个版本

[PDF] arxiv.org

An overview of the data-loader landscape: Comparative performance analysis

I Ofeidis, D Kiedanski, L Tassiulas - arXiv preprint arXiv:2209.13705, 2022 - arxiv.org

Dataloaders, in charge of moving data from storage into GPUs while training machine
learning models, might hold the key to drastically improving the performance of training jobs …

被引用次数：12 相关文章所有 2 个版本

[PDF] arxiv.org

A modular benchmarking infrastructure for high-performance and reproducible deep learning

T Ben-Nun, M Besta, S Huber… - 2019 IEEE …, 2019 - ieeexplore.ieee.org

We introduce Deep500: the first customizable benchmarking infrastructure that enables fair
comparison of the plethora of deep learning frameworks, algorithms, libraries, and …

被引用次数：93 相关文章所有 25 个版本

[HTML] peerj.com

[HTML][HTML] Effect of neural network structure in accelerating performance and accuracy of a convolutional neural network with GPU/TPU for image analytics

A Ravikumar, H Sriraman, PMS Saketh… - PeerJ Computer …, 2022 - peerj.com

Background In deep learning the most significant breakthrough in the field of image
recognition, object detection language processing was done by Convolutional Neural …

被引用次数：35 相关文章所有 8 个版本

高级搜索

QQ 群