T Zheng,
B Li,
H Bao, J Wang, W Shan… - Findings of the …, 2024 - aclanthology.org
The design choices in Transformer feed-forward neural networks have resulted in significant
computational and parameter overhead. In this work, we emphasize the importance of …