The marginal value of momentum for small learning rate sgd

R Wang, S Malladi, T Wang, K Lyu, Z Li - arXiv preprint arXiv:2307.15196, 2023 - arxiv.org
Momentum is known to accelerate the convergence of gradient descent in strongly convex
settings without stochastic gradient noise. In stochastic optimization, such as training neural …

A Small Object Detection Method for Drone-Captured Images Based on Improved YOLOv7

D Zhao, F Shao, Q Liu, L Yang, H Zhang, Z Zhang - Remote Sensing, 2024 - mdpi.com
Due to the broad usage and widespread popularity of drones, the demand for a more
accurate object detection algorithm for images captured by drone platforms has become …

Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training

A Kosson, B Messmer, M Jaggi - arXiv preprint arXiv:2410.23922, 2024 - arxiv.org
Learning Rate Warmup is a popular heuristic for training neural networks, especially at
larger batch sizes, despite limited understanding of its benefits. Warmup decreases the …

Role of Momentum in Smoothing Objective Function in Implicit Graduated Optimization

N Sato, H Iiduka - arXiv preprint arXiv:2402.02325, 2024 - arxiv.org
While stochastic gradient descent (SGD) with momentum has fast convergence and
excellent generalizability, a theoretical explanation for this is lacking. In this paper, we show …

[PDF][PDF] Semantic image segmentation and automated structure picking from acoustic televiewer images

P Perritaz - 2024 - research-collection.ethz.ch
ABSTRACT∗ In fields such as groundwater extraction, geothermal energy, nuclear waste
storage, and geoenergy exploration, borehole imaging is crucial for understanding fluid …

Analyzing & Eliminating Learning Rate Warmup in GPT Pre-Training

A Kosson, B Messmer, M Jaggi - … 2024: The Emergence of Structure and … - openreview.net
Learning Rate Warmup is a popular heuristic for training neural networks, which downscales
early updates relative to later ones. This aids training, suggesting that the initial updates are …

Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults

P Phunyaphibarn, J Lee, B Wang, H Zhang… - … Learning Dynamics 2024 … - openreview.net
Although gradient descent with Polyak's momentum is widely used in modern machine and
deep learning, a concrete understanding of its effects on the training trajectory remains …