Measuring the effects of data parallelism on neural network training

CJ Shallue, J Lee, J Antognini, J Sohl-Dickstein… - Journal of Machine …, 2019 - jmlr.org
Recent hardware developments have dramatically increased the scale of data parallelism
available for neural network training. Among the simplest ways to harness next-generation …

Which algorithmic choices matter at which batch sizes? insights from a noisy quadratic model

G Zhang, L Li, Z Nado, J Martens… - Advances in neural …, 2019 - proceedings.neurips.cc
Increasing the batch size is a popular way to speed up neural network training, but beyond
some critical batch size, larger batch sizes yield diminishing returns. In this work, we study …

SBXception: a shallower and broader xception architecture for efficient classification of skin lesions

A Mehmood, Y Gulzar, QM Ilyas, A Jabbari, M Ahmad… - Cancers, 2023 - mdpi.com
Simple Summary Skin cancer is a major concern worldwide, and accurately identifying it is
crucial for effective treatment. we propose a modified deep learning model called …

SparCML: High-performance sparse communication for machine learning

C Renggli, S Ashkboos, M Aghagolzadeh… - Proceedings of the …, 2019 - dl.acm.org
Applying machine learning techniques to the quickly growing data in science and industry
requires highly-scalable algorithms. Large datasets are most commonly processed" data …

A novel multi-branch channel expansion network for garbage image classification

C Shi, R Xia, L Wang - IEEE access, 2020 - ieeexplore.ieee.org
Due to the lack of data available for training, deep learning hardly performed well in the field
of garbage image classification. We choose the TrashNet data set which is widely used in …

Critical temperature prediction for a superconductor: A variational bayesian neural network approach

TD Le, R Noumeir, HL Quach, JH Kim… - IEEE Transactions …, 2020 - ieeexplore.ieee.org
Much research in recent years has focused on using empirical machine learning
approaches to extract useful insights on the structure-property relationships of …

Critical parameters for scalable distributed learning with large batches and asynchronous updates

S Stich, A Mohtashami, M Jaggi - … Conference on Artificial …, 2021 - proceedings.mlr.press
It has been experimentally observed that the efficiency of distributed training with stochastic
gradient (SGD) depends decisively on the batch size and—in asynchronous …

Improving scalability of parallel CNN training by adjusting mini-batch size at run-time

S Lee, Q Kang, S Madireddy… - … Conference on Big …, 2019 - ieeexplore.ieee.org
Training Convolutional Neural Network (CNN) is a computationally intensive task, requiring
efficient parallelization to shorten the execution time. Considering the ever-increasing size of …

When and why momentum accelerates sgd: An empirical study

J Fu, B Wang, H Zhang, Z Zhang, W Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
Momentum has become a crucial component in deep learning optimizers, necessitating a
comprehensive understanding of when and why it accelerates stochastic gradient descent …

A new perspective for understanding generalization gap of deep neural networks trained with large batch sizes

OK Oyedotun, K Papadopoulos, D Aouada - Applied Intelligence, 2023 - Springer
Deep neural networks (DNNs) are typically optimized using various forms of mini-batch
gradient descent algorithm. A major motivation for mini-batch gradient descent is that with a …