相关文章- 学术资源搜索

Solving attention kernel regression problem via pre-conditioner

Z Song, J Yin, L Zhang - International Conference on …, 2024 - proceedings.mlr.press

Attention mechanism is the key to large language models, and attention matrix serves as an
algorithmic and computational bottleneck for such a scheme. In this paper, we define two …

被引用次数：6 相关文章所有 4 个版本

[PDF] neurips.cc

Fast attention requires bounded entries

J Alman, Z Song - Advances in Neural Information …, 2024 - proceedings.neurips.cc

In modern machine learning, inner product attention computation is a fundamental task for
training large language models such as Transformer, GPT-1, BERT, GPT-2, GPT-3 and …

被引用次数：58 相关文章所有 5 个版本

[PDF] mlr.press

Sharp analysis of low-rank kernel matrix approximations

F Bach - Conference on learning theory, 2013 - proceedings.mlr.press

We consider supervised learning problems within the positive-definite kernel framework,
such as kernel ridge regression, kernel logistic regression or the support vector machine …

被引用次数：344 相关文章所有 16 个版本

[PDF] arxiv.org

Solving regularized exp, cosh and sinh regression problems

Z Li, Z Song, T Zhou - arXiv preprint arXiv:2303.15725, 2023 - arxiv.org

In modern machine learning, attention computation is a fundamental task for training large
language models such as Transformer, GPT-4 and ChatGPT. In this work, we study …

被引用次数：17 相关文章所有 2 个版本

[HTML] sciencedirect.com

[HTML][HTML] Sparse factorization of square matrices with application to neural attention modeling

R Khalitov, T Yu, L Cheng, Z Yang - Neural Networks, 2022 - Elsevier

Square matrices appear in many machine learning problems and models. Optimization over
a large square matrix is expensive in memory and in time. Therefore an economic …

被引用次数：9 相关文章所有 5 个版本

[PDF] neurips.cc

Learning-based low-rank approximations

P Indyk, A Vakilian, Y Yuan - Advances in Neural …, 2019 - proceedings.neurips.cc

We introduce a “learning-based” algorithm for the low-rank decomposition problem: given
an $ n\times d $ matrix $ A $, and a parameter $ k $, compute a rank-$ k $ matrix $ A'$ that …

被引用次数：55 相关文章所有 12 个版本

[PDF] arxiv.org

In-context learning for attention scheme: from single softmax regression to multiple softmax regression via a tensor trick

Y Gao, Z Song, S Xie - arXiv preprint arXiv:2307.02419, 2023 - arxiv.org

Large language models (LLMs) have brought significant and transformative changes in
human society. These models have demonstrated remarkable capabilities in natural …

被引用次数：23 相关文章所有 2 个版本

[PDF] aaai.org

[PDF][PDF] Weighted low-rank approximations

N Srebro, T Jaakkola - … of the 20th international conference on …, 2003 - cdn.aaai.org

We study the common problem of approximating a target matrix with a matrix of lower rank.
We provide a simple and efficient (EM) algorithm for solving weighted low-rank …

被引用次数：1026 相关文章所有 16 个版本

[引用][C] Sparse greedy matrix approximation for machine learning

AJ Smola, B Schökopf - … of the seventeenth international conference on …, 2000 - dl.acm.org

Sparse Greedy Matrix Approximation for Machine Learning | Proceedings of the Seventeenth
International Conference on Machine Learning skip to main content ACM Digital Library home …

被引用次数：902 相关文章所有 5 个版本

[PDF] arxiv.org

Conditional gradient algorithmsfor rank-one matrix approximations with a sparsity constraint

R Luss, M Teboulle - siam REVIEW, 2013 - SIAM

The sparsity constrained rank-one matrix approximation problem is a difficult mathematical
optimization problem which arises in a wide array of useful applications in engineering …

被引用次数：118 相关文章所有 12 个版本

高级搜索

QQ 群

Solving attention kernel regression problem via pre-conditioner

Fast attention requires bounded entries

Sharp analysis of low-rank kernel matrix approximations

Solving regularized exp, cosh and sinh regression problems

[HTML][HTML] Sparse factorization of square matrices with application to neural attention modeling

Learning-based low-rank approximations

In-context learning for attention scheme: from single softmax regression to multiple softmax regression via a tensor trick

[PDF][PDF] Weighted low-rank approximations

[引用][C] Sparse greedy matrix approximation for machine learning

Conditional gradient algorithmsfor rank-one matrix approximations with a sparsity constraint

相关搜索

引用