Loss spike in training neural networks

M Liu, J Tang, Y Chen, H Li, J Qi, S Li, K Wang, J Gan… - Neural Networks, 2025 - Elsevier

Artificial neural networks (ANNs) can help camera-based remote photoplethysmography
(rPPG) in measuring cardiac activity and physiological signals from facial videos, such as …

被引用次数：10 相关文章所有 3 个版本

[PDF] arxiv.org

Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning

L Zhu, C Liu, A Radhakrishnan, M Belkin - arXiv preprint arXiv:2306.04815, 2023 - arxiv.org

In this paper, we first present an explanation regarding the common occurrence of spikes in
the training loss when neural networks are trained with stochastic gradient descent (SGD) …

被引用次数：14 相关文章所有 4 个版本

[PDF] arxiv.org

Simultaneous Dimensionality Reduction for Extracting Useful Representations of Large Empirical Multimodal Datasets

E Abdelaleem - arXiv preprint arXiv:2410.19867, 2024 - arxiv.org

The quest for simplification in physics drives the exploration of concise mathematical
representations for complex systems. This Dissertation focuses on the concept of …

SMMF: Square-Matricized Momentum Factorization for Memory-Efficient Optimization

K Park, S Lee - arXiv preprint arXiv:2412.08894, 2024 - arxiv.org

We propose SMMF (Square-Matricized Momentum Factorization), a memory-efficient
optimizer that reduces the memory requirement of the widely used adaptive learning rate …

On the Relationship Between Double Descent of CNNs and Shape/Texture Bias Under Learning Process

S Iwase, S Takahashi, N Inoue, R Yokota… - … Conference on Pattern …, 2025 - Springer

The double descent phenomenon, which deviates from the traditional bias-variance trade-off
theory, attracts considerable research attention; however, the mechanism of its occurrence is …

Toward Understanding the Dynamics of Over-parameterized Neural Networks

L Zhu - 2024 - search.proquest.com

The practical applications of neural networks are vast and varied, yet a comprehensive
understanding of their underlying principles remains incomplete. This dissertation advances …

Initialization is Critical to Whether Transformers Fit Composite Functions by Reasoning or Memorizing

Z Zhang, P Lin, Z Wang, Y Zhang, ZQJ Xu - The Thirty-eighth Annual … - openreview.net

Transformers have shown impressive capabilities across various tasks, but their
performance on compositional problems remains a topic of debate. In this work, we …

高级搜索

QQ 群