Exploiting transformer activation sparsity with dynamic inference

C Xiao, Z Zhang, C Song, D Jiang, F Yao, X Han… - arXiv preprint arXiv …, 2024 - arxiv.org

Advancements in LLMs have recently unveiled challenges tied to computational efficiency
and continual scalability due to their requirements of huge parameters, making the …

被引用次数：8 相关文章所有 3 个版本

[PDF] openreview.net

Prompt-prompted adaptive structured pruning for efficient llm generation

H Dong, B Chen, Y Chi - First Conference on Language Modeling, 2024 - openreview.net

With the development of transformer-based large language models (LLMs), they have been
applied to many fields due to their remarkable utility, but this comes at a considerable …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Prompt-prompted Mixture of Experts for Efficient LLM Generation

H Dong, B Chen, Y Chi - arXiv preprint arXiv:2404.01365, 2024 - arxiv.org

With the development of transformer-based large language models (LLMs), they have been
applied to many fields due to their remarkable utility, but this comes at a considerable …

被引用次数：5 相关文章所有 2 个版本

[PDF] sagepub.com Full View

Conditional computation in neural networks: Principles and research trends

S Scardapane, A Baiocchi, A Devoto… - Intelligenza …, 2024 - journals.sagepub.com

This article summarizes principles and ideas from the emerging area of applying conditional
computation methods to the design of neural networks. In particular, we focus on neural …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

高级搜索

QQ 群