Introduction to Transformers: an NLP Perspective

T Xiao, J Zhu - arXiv preprint arXiv:2311.17633, 2023 - arxiv.org
Transformers have dominated empirical machine learning models of natural language
processing. In this paper, we introduce basic concepts of Transformers and present key …

PartialFormer: Modeling Part Instead of Whole

T Zheng, B Li, H Bao, W Shan, T Xiao, J Zhu - arXiv preprint arXiv …, 2023 - arxiv.org
The design choices in Transformer feed-forward neural networks have resulted in significant
computational and parameter overhead. In this work, we emphasize the importance of …

Partialformer: Modeling part instead of whole for machine translation

T Zheng, B Li, H Bao, J Wang, W Shan… - Findings of the …, 2024 - aclanthology.org
The design choices in Transformer feed-forward neural networks have resulted in significant
computational and parameter overhead. In this work, we emphasize the importance of …