Solving attention kernel regression problem via pre-conditioner

Z Song, J Yin, L Zhang - International Conference on …, 2024 - proceedings.mlr.press
Attention mechanism is the key to large language models, and attention matrix serves as an
algorithmic and computational bottleneck for such a scheme. In this paper, we define two …

Fast attention requires bounded entries

J Alman, Z Song - Advances in Neural Information …, 2024 - proceedings.neurips.cc
In modern machine learning, inner product attention computation is a fundamental task for
training large language models such as Transformer, GPT-1, BERT, GPT-2, GPT-3 and …

Sharp analysis of low-rank kernel matrix approximations

F Bach - Conference on learning theory, 2013 - proceedings.mlr.press
We consider supervised learning problems within the positive-definite kernel framework,
such as kernel ridge regression, kernel logistic regression or the support vector machine …

Solving regularized exp, cosh and sinh regression problems

Z Li, Z Song, T Zhou - arXiv preprint arXiv:2303.15725, 2023 - arxiv.org
In modern machine learning, attention computation is a fundamental task for training large
language models such as Transformer, GPT-4 and ChatGPT. In this work, we study …

[HTML][HTML] Sparse factorization of square matrices with application to neural attention modeling

R Khalitov, T Yu, L Cheng, Z Yang - Neural Networks, 2022 - Elsevier
Square matrices appear in many machine learning problems and models. Optimization over
a large square matrix is expensive in memory and in time. Therefore an economic …

Learning-based low-rank approximations

P Indyk, A Vakilian, Y Yuan - Advances in Neural …, 2019 - proceedings.neurips.cc
We introduce a “learning-based” algorithm for the low-rank decomposition problem: given
an $ n\times d $ matrix $ A $, and a parameter $ k $, compute a rank-$ k $ matrix $ A'$ that …

In-context learning for attention scheme: from single softmax regression to multiple softmax regression via a tensor trick

Y Gao, Z Song, S Xie - arXiv preprint arXiv:2307.02419, 2023 - arxiv.org
Large language models (LLMs) have brought significant and transformative changes in
human society. These models have demonstrated remarkable capabilities in natural …

[PDF][PDF] Weighted low-rank approximations

N Srebro, T Jaakkola - … of the 20th international conference on …, 2003 - cdn.aaai.org
We study the common problem of approximating a target matrix with a matrix of lower rank.
We provide a simple and efficient (EM) algorithm for solving weighted low-rank …

[引用][C] Sparse greedy matrix approximation for machine learning

AJ Smola, B Schökopf - … of the seventeenth international conference on …, 2000 - dl.acm.org
Sparse Greedy Matrix Approximation for Machine Learning | Proceedings of the Seventeenth
International Conference on Machine Learning skip to main content ACM Digital Library home …

Conditional gradient algorithmsfor rank-one matrix approximations with a sparsity constraint

R Luss, M Teboulle - siam REVIEW, 2013 - SIAM
The sparsity constrained rank-one matrix approximation problem is a difficult mathematical
optimization problem which arises in a wide array of useful applications in engineering …