作者
Nikolay Penkov, Konstantinos Balaskas, Martin Rapp, Joerg Henkel
发表日期
2023/9/25
期刊
IEEE Embedded Systems Letters
出版商
IEEE
简介
Transformer models are continuously achieving state-of-the-art performance on a wide range of benchmarks. To meet demanding performance targets, the number of model parameters is continuously increased. As a result, state-of-the-art Transformers require substantial computational resources prohibiting their deployment on consumer-grade hardware. In the literature, overparameterized Transformers are successfully reduced in size with the help of pruning strategies. Existing works lack the ability to optimize the full architecture, without incurring significant overheads, in a fully differentiable manner. Our work proposes a single-stage approach for training a Transformer for memory-efficient inference and various resource-constrained scenarios. Transformer blocks are extended with trainable gate parameters, which attribute importance and control information flow. Their integration into a differentiable pruning …
学术搜索中的文章
N Penkov, K Balaskas, M Rapp, J Henkel - IEEE Embedded Systems Letters, 2023