Linearizing Large Language Models

文章

学术资源搜索

获得 1 条结果（用时0.02秒）

我的图书馆

Linearizing Large Language Models

在引用文章中搜索

[PDF] arxiv.org

Parallelizing Linear Transformers with the Delta Rule over Sequence Length

S Yang, B Wang, Y Zhang, Y Shen, Y Kim - arXiv preprint arXiv …, 2024 - arxiv.org

Transformers with linear attention (ie, linear transformers) and state-space models have
recently been suggested as a viable linear-time alternative to transformers with softmax …

高级搜索

QQ 群

Linearizing Large Language Models

Parallelizing Linear Transformers with the Delta Rule over Sequence Length

引用