An online and unified algorithm for projection matrix vector multiplication with application...

Z Zhang, Y Sheng, T Zhou, T Chen… - Advances in …, 2023 - proceedings.neurips.cc

Abstract Large Language Models (LLMs), despite their recent impressive accomplishments,
are notably cost-prohibitive to deploy, particularly for applications involving long-content …

被引用次数：237 相关文章所有 7 个版本

[PDF] mlr.press

Deja vu: Contextual sparsity for efficient llms at inference time

Z Liu, J Wang, T Dao, T Zhou, B Yuan… - International …, 2023 - proceedings.mlr.press

Large language models (LLMs) with hundreds of billions of parameters have sparked a new
wave of exciting AI applications. However, they are computationally expensive at inference …

被引用次数：234 相关文章所有 7 个版本

[PDF] arxiv.org

Attention scheme inspired softmax regression

Y Deng, Z Li, Z Song - arXiv preprint arXiv:2304.10411, 2023 - arxiv.org

Large language models (LLMs) have made transformed changes for human society. One of
the key computation in LLMs is the softmax unit. This operation is important in LLMs …

被引用次数：47 相关文章所有 2 个版本

[PDF] mlr.press

A Nearly-Optimal Bound for Fast Regression with Guarantee

Z Song, M Ye, J Yin, L Zhang - International Conference on …, 2023 - proceedings.mlr.press

Given a matrix $ A\in\mathbb {R}^{n\times d} $ and a vector $ b\in\mathbb {R}^ n $, we
consider the regression problem with $\ell_\infty $ guarantees: finding a vector …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

Training multi-layer over-parametrized neural network in subquadratic time

Z Song, L Zhang, R Zhang - arXiv preprint arXiv:2112.07628, 2021 - arxiv.org

We consider the problem of training a multi-layer over-parametrized neural network to
minimize the empirical risk induced by a loss function. In the typical setting of over …

被引用次数：70 相关文章所有 6 个版本

[PDF] mlr.press

Sketching for first order method: efficient algorithm for low-bandwidth channel and vulnerability

Z Song, Y Wang, Z Yu, L Zhang - … Conference on Machine …, 2023 - proceedings.mlr.press

Sketching is one of the most fundamental tools in large-scale machine learning. It enables
runtime and memory saving via randomly compressing the original large problem into lower …

被引用次数：31 相关文章所有 6 个版本

[PDF] arxiv.org

Multi-layer transformers gradient can be approximated in almost linear time

Y Liang, Z Sha, Z Shi, Z Song, Y Zhou - arXiv preprint arXiv:2408.13233, 2024 - arxiv.org

The computational complexity of the self-attention mechanism in popular transformer
architectures poses significant challenges for training and inference, and becomes the …

被引用次数：18 相关文章所有 5 个版本

[PDF] arxiv.org

Gradientcoin: A peer-to-peer decentralized large language models

Y Gao, Z Song, J Yin - arXiv preprint arXiv:2308.10502, 2023 - arxiv.org

Since 2008, after the proposal of a Bitcoin electronic cash system, Bitcoin has fundamentally
changed the economic system over the last decade. Since 2022, large language models …

被引用次数：23 相关文章所有 5 个版本

[PDF] arxiv.org

Hsr-enhanced sparse attention acceleration

B Chen, Y Liang, Z Sha, Z Shi, Z Song - arXiv preprint arXiv:2410.10165, 2024 - arxiv.org

Large Language Models (LLMs) have demonstrated remarkable capabilities across various
applications, but their performance on long-context tasks is often limited by the …

被引用次数：12 相关文章所有 3 个版本

[PDF] neurips.cc

Infoprompt: Information-theoretic soft prompt tuning for natural language understanding

J Wu, T Yu, R Wang, Z Song, R Zhang… - Advances in …, 2024 - proceedings.neurips.cc

Soft prompt tuning achieves superior performances across a wide range of few-shot tasks.
However, the performances of prompt tuning can be highly sensitive to the initialization of …

被引用次数：20 相关文章所有 7 个版本

高级搜索

QQ 群