On efficient training of large-scale deep learning models: A literature review

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - arXiv preprint arXiv …, 2023 - arxiv.org
The field of deep learning has witnessed significant progress, particularly in computer vision
(CV), natural language processing (NLP), and speech. The use of large-scale models …

On Efficient Training of Large-Scale Deep Learning Models

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - ACM Computing Surveys, 2024 - dl.acm.org
The field of deep learning has witnessed significant progress in recent times, particularly in
areas such as computer vision (CV), natural language processing (NLP), and speech. The …

Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy

C Tan, J Zhang, J Liu, Y Wang, Y Hao - arXiv preprint arXiv:2401.07250, 2024 - arxiv.org
Recently, sharpness-aware minimization (SAM) has attracted a lot of attention because of its
surprising effectiveness in improving generalization performance. However, training neural …

Improving SAM requires rethinking its optimization formulation

W Xie, F Latorre, K Antonakopoulos, T Pethick… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper rethinks Sharpness-Aware Minimization (SAM), which is originally formulated as
a zero-sum game where the weights of a network and a bounded perturbation try to …

Comprehensive survey on the effectiveness of sharpness aware minimization and its progressive variants

J Rostand, CCJ Hsu, CK Lu - Journal of the Chinese Institute of …, 2024 - Taylor & Francis
As advancements push for larger and more complex Artificial Intelligence (AI) models to
improve performance, preventing the occurrence of overfitting when training …

Bilateral Sharpness-Aware Minimization for Flatter Minima

J Deng, J Pang, B Zhang, Q Huang - arXiv preprint arXiv:2409.13173, 2024 - arxiv.org
Sharpness-Aware Minimization (SAM) enhances generalization by reducing a Max-
Sharpness (MaxS). Despite the practical success, we empirically found that the MAxS …

[HTML][HTML] Multi-head CLIP: Improving CLIP with diverse representations and flat minima

M Zhou, X Zhou, E Li, S Ermon, R Ge - 2023 - amazon.science
Abstract Contrastive Language-Image Pre-training (CLIP) has shown remarkable success in
the field of multimodal learning by enabling joint understanding of text and images. In this …

Domain-Generalization to Improve Learning in Meta-Learning Algorithms

U Anjum, C Stockman, C Luong, J Zhan - Available at SSRN 4992443 - papers.ssrn.com
In this paper, we propose a novel meta-learning algorithm for learning new tasksfrom a
small number of training samples called Domain Generalization SharpnessAware …