Learned token pruning for transformers

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier

Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

被引用次数：65 相关文章所有 6 个版本

[PDF] arxiv.org

Towards unified deep image deraining: A survey and a new benchmark

X Chen, J Pan, J Dong, J Tang - arXiv preprint arXiv:2310.03535, 2023 - arxiv.org

Recent years have witnessed significant advances in image deraining due to the kinds of
effective image priors and deep learning models. As each deraining approach has …

被引用次数：18 相关文章所有 2 个版本

[PDF] arxiv.org

Llmlingua: Compressing prompts for accelerated inference of large language models

H Jiang, Q Wu, CY Lin, Y Yang, L Qiu - arXiv preprint arXiv:2310.05736, 2023 - arxiv.org

Large language models (LLMs) have been applied in various applications due to their
astonishing capabilities. With advancements in technologies such as chain-of-thought (CoT) …

被引用次数：161 相关文章所有 6 个版本

[PDF] aaai.org

Evo-vit: Slow-fast token evolution for dynamic vision transformer

Y Xu, Z Zhang, M Zhang, K Sheng, K Li… - Proceedings of the …, 2022 - ojs.aaai.org

Vision transformers (ViTs) have recently received explosive popularity, but the huge
computational cost is still a severe issue. Since the computation complexity of ViT is …

被引用次数：180 相关文章所有 5 个版本

[PDF] thecvf.com

Less is more: Focus attention for efficient detr

D Zheng, W Dong, H Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

DETR-like models have significantly boosted the performance of detectors and even
outperformed classical convolutional models. However, all tokens are treated equally …

被引用次数：63 相关文章所有 5 个版本

[PDF] arxiv.org

The optimal bert surgeon: Scalable and accurate second-order pruning for large language models

E Kurtic, D Campos, T Nguyen, E Frantar… - arXiv preprint arXiv …, 2022 - arxiv.org

Transformer-based language models have become a key building block for natural
language processing. While these models are extremely accurate, they can be too large and …

被引用次数：125 相关文章所有 3 个版本

[PDF] arxiv.org

Full stack optimization of transformer inference: a survey

S Kim, C Hooper, T Wattanawong, M Kang… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …

被引用次数：90 相关文章所有 4 个版本

[PDF] thecvf.com

Sparsevit: Revisiting activation sparsity for efficient high-resolution vision transformer

X Chen, Z Liu, H Tang, L Yi… - Proceedings of the …, 2023 - openaccess.thecvf.com

High-resolution images enable neural networks to learn richer visual representations.
However, this improved performance comes at the cost of growing computational …

被引用次数：46 相关文章所有 7 个版本

[PDF] arxiv.org

Model tells you what to discard: Adaptive kv cache compression for llms

S Ge, Y Zhang, L Liu, M Zhang, J Han, J Gao - arXiv preprint arXiv …, 2023 - arxiv.org

In this study, we introduce adaptive KV cache compression, a plug-and-play method that
reduces the memory footprint of generative inference for Large Language Models (LLMs) …

被引用次数：126 相关文章所有 5 个版本

[PDF] neurips.cc

Dynamic context pruning for efficient and interpretable autoregressive transformers

S Anagnostidis, D Pavllo, L Biggio… - Advances in …, 2024 - proceedings.neurips.cc

Abstract Autoregressive Transformers adopted in Large Language Models (LLMs) are hard
to scale to long sequences. Despite several works trying to reduce their computational cost …

被引用次数：44 相关文章所有 6 个版本

高级搜索

QQ 群