Probabilistic transformer: A probabilistic dependency model for contextual word representation

文章

学术资源搜索

获得 2 条结果（用时0.02秒）

我的图书馆

Probabilistic transformer: A probabilistic dependency model for contextual word representation

在引用文章中搜索

[PDF] arxiv.org

Layer-Condensed KV Cache for Efficient Inference of Large Language Models

H Wu, K Tu - arXiv preprint arXiv:2405.10637, 2024 - arxiv.org

Huge memory consumption has been a major bottleneck for deploying high-throughput
large language models in real-world applications. In addition to the large number of …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

Towards understanding how attention mechanism works in deep learning

T Ruan, S Zhang - arXiv preprint arXiv:2412.18288, 2024 - arxiv.org

Attention mechanism has been extensively integrated within mainstream neural network
architectures, such as Transformers and graph attention networks. Yet, its underlying …

被引用次数：2 相关文章所有 2 个版本

高级搜索

QQ 群

Probabilistic transformer: A probabilistic dependency model for contextual word representation

Layer-Condensed KV Cache for Efficient Inference of Large Language Models

Towards understanding how attention mechanism works in deep learning

引用