BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference

C Lee, SM Kwon, Q Qu, HS Kim - arXiv preprint arXiv:2410.21262, 2024 - arxiv.org
Large-scale foundation models have demonstrated exceptional performance in language
and vision tasks. However, the numerous dense matrix-vector operations involved in these …

4-bit Shampoo for Memory-Efficient Network Training

S Wang, J Li, P Zhou, H Huang - arXiv preprint arXiv:2405.18144, 2024 - arxiv.org
Second-order optimizers, maintaining a matrix termed a preconditioner, are superior to first-
order optimizers in both theory and practice. The states forming the preconditioner and its …

Adaptive Curvature Step Size: A Path Geometry Based Approach to Optimization

R Madhavan - openreview.net
We propose the Adaptive Curvature Step Size (ACSS) method, which dynamically adjusts
the step size based on the local geometry of the optimization path. Our approach computes …