T Huang, S You, F Wang, C Qian, C Xu - arXiv preprint arXiv:2205.10536, 2022 - arxiv.org
Unlike existing knowledge distillation methods focus on the baseline settings, where the
teacher models and training strategies are not that strong and competing as state-of-the-art …