查看文章

Differentiable Slimming for Memory-Efficient Transformers

作者

Nikolay Penkov, Konstantinos Balaskas, Martin Rapp, Joerg Henkel

发表日期

2023/9/25

期刊

IEEE Embedded Systems Letters

出版商

IEEE

简介

Transformer models are continuously achieving state-of-the-art performance on a wide range of benchmarks. To meet demanding performance targets, the number of model parameters is continuously increased. As a result, state-of-the-art Transformers require substantial computational resources prohibiting their deployment on consumer-grade hardware. In the literature, overparameterized Transformers are successfully reduced in size with the help of pruning strategies. Existing works lack the ability to optimize the full architecture, without incurring significant overheads, in a fully differentiable manner. Our work proposes a single-stage approach for training a Transformer for memory-efficient inference and various resource-constrained scenarios. Transformer blocks are extended with trainable gate parameters, which attribute importance and control information flow. Their integration into a differentiable pruning …

学术搜索中的文章

Differentiable Slimming for Memory-Efficient Transformers

N Penkov, K Balaskas, M Rapp, J Henkel - IEEE Embedded Systems Letters, 2023