Xtremedistiltransformers: Task transfer for task-agnostic distillation

S Mukherjee, AH Awadallah, J Gao - arXiv preprint arXiv:2106.04563, 2021 - arxiv.org
While deep and large pre-trained models are the state-of-the-art for various natural
language processing tasks, their huge size poses significant challenges for practical uses in …

XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation

S Mukherjee, A Hassan Awadallah, J Gao - arXiv e-prints, 2021 - ui.adsabs.harvard.edu
While deep and large pre-trained models are the state-of-the-art for various natural
language processing tasks, their huge size poses significant challenges for practical uses in …