Block low-rank preconditioner with shared basis for stochastic optimization

文章

学术资源搜索

获得 3 条结果（用时0.02秒）

我的图书馆

Block low-rank preconditioner with shared basis for stochastic optimization

在引用文章中搜索

[PDF] arxiv.org

BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference

C Lee, SM Kwon, Q Qu, HS Kim - arXiv preprint arXiv:2410.21262, 2024 - arxiv.org

Large-scale foundation models have demonstrated exceptional performance in language
and vision tasks. However, the numerous dense matrix-vector operations involved in these …

4-bit Shampoo for Memory-Efficient Network Training

S Wang, J Li, P Zhou, H Huang - arXiv preprint arXiv:2405.18144, 2024 - arxiv.org

Second-order optimizers, maintaining a matrix termed a preconditioner, are superior to first-
order optimizers in both theory and practice. The states forming the preconditioner and its …

Adaptive Curvature Step Size: A Path Geometry Based Approach to Optimization

R Madhavan - openreview.net

We propose the Adaptive Curvature Step Size (ACSS) method, which dynamically adjusts
the step size based on the local geometry of the optimization path. Our approach computes …

高级搜索

QQ 群

Block low-rank preconditioner with shared basis for stochastic optimization

BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference

4-bit Shampoo for Memory-Efficient Network Training

Adaptive Curvature Step Size: A Path Geometry Based Approach to Optimization

引用