The effect of network width on the performance of large-batch training

CJ Shallue, J Lee, J Antognini, J Sohl-Dickstein… - Journal of Machine …, 2019 - jmlr.org

Recent hardware developments have dramatically increased the scale of data parallelism
available for neural network training. Among the simplest ways to harness next-generation …

被引用次数：444 相关文章所有 7 个版本

[PDF] neurips.cc

Which algorithmic choices matter at which batch sizes? insights from a noisy quadratic model

G Zhang, L Li, Z Nado, J Martens… - Advances in neural …, 2019 - proceedings.neurips.cc

Increasing the batch size is a popular way to speed up neural network training, but beyond
some critical batch size, larger batch sizes yield diminishing returns. In this work, we study …

被引用次数：149 相关文章所有 8 个版本

[PDF] mdpi.com

SBXception: a shallower and broader xception architecture for efficient classification of skin lesions

A Mehmood, Y Gulzar, QM Ilyas, A Jabbari, M Ahmad… - Cancers, 2023 - mdpi.com

Simple Summary Skin cancer is a major concern worldwide, and accurately identifying it is
crucial for effective treatment. we propose a modified deep learning model called …

被引用次数：27 相关文章所有 9 个版本

[PDF] arxiv.org

SparCML: High-performance sparse communication for machine learning

C Renggli, S Ashkboos, M Aghagolzadeh… - Proceedings of the …, 2019 - dl.acm.org

Applying machine learning techniques to the quickly growing data in science and industry
requires highly-scalable algorithms. Large datasets are most commonly processed" data …

被引用次数：143 相关文章所有 22 个版本

[PDF] ieee.org

A novel multi-branch channel expansion network for garbage image classification

C Shi, R Xia, L Wang - IEEE access, 2020 - ieeexplore.ieee.org

Due to the lack of data available for training, deep learning hardly performed well in the field
of garbage image classification. We choose the TrashNet data set which is widely used in …

被引用次数：55 相关文章所有 2 个版本

[PDF] arxiv.org

Critical temperature prediction for a superconductor: A variational bayesian neural network approach

TD Le, R Noumeir, HL Quach, JH Kim… - IEEE Transactions …, 2020 - ieeexplore.ieee.org

Much research in recent years has focused on using empirical machine learning
approaches to extract useful insights on the structure-property relationships of …

被引用次数：41 相关文章所有 10 个版本

[PDF] mlr.press

Critical parameters for scalable distributed learning with large batches and asynchronous updates

S Stich, A Mohtashami, M Jaggi - … Conference on Artificial …, 2021 - proceedings.mlr.press

It has been experimentally observed that the efficiency of distributed training with stochastic
gradient (SGD) depends decisively on the batch size and—in asynchronous …

被引用次数：23 相关文章所有 6 个版本

[PDF] northwestern.edu

Improving scalability of parallel CNN training by adjusting mini-batch size at run-time

S Lee, Q Kang, S Madireddy… - … Conference on Big …, 2019 - ieeexplore.ieee.org

Training Convolutional Neural Network (CNN) is a computationally intensive task, requiring
efficient parallelization to shorten the execution time. Considering the ever-increasing size of …

被引用次数：31 相关文章所有 7 个版本

[PDF] arxiv.org

When and why momentum accelerates sgd: An empirical study

J Fu, B Wang, H Zhang, Z Zhang, W Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

Momentum has become a crucial component in deep learning optimizers, necessitating a
comprehensive understanding of when and why it accelerates stochastic gradient descent …

被引用次数：7 相关文章所有 4 个版本

[PDF] arxiv.org

A new perspective for understanding generalization gap of deep neural networks trained with large batch sizes

OK Oyedotun, K Papadopoulos, D Aouada - Applied Intelligence, 2023 - Springer

Deep neural networks (DNNs) are typically optimized using various forms of mini-batch
gradient descent algorithm. A major motivation for mini-batch gradient descent is that with a …

被引用次数：8 相关文章所有 7 个版本

高级搜索

QQ 群