Prompting to distill: Boosting data-free knowledge distillation via reinforced prompt

X Ma, X Wang, G Fang, Y Shen, W Lu - arXiv preprint arXiv:2205.07523, 2022 - arxiv.org
Data-free knowledge distillation (DFKD) conducts knowledge distillation via eliminating the
dependence of original training data, and has recently achieved impressive results in …

Up to 100x faster data-free knowledge distillation

G Fang, K Mo, X Wang, J Song, S Bei… - Proceedings of the …, 2022 - ojs.aaai.org
Data-free knowledge distillation (DFKD) has recently been attracting increasing attention
from research communities, attributed to its capability to compress a model only using …

What makes a" good" data augmentation in knowledge distillation-a statistical perspective

H Wang, S Lohit, MN Jones… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract Knowledge distillation (KD) is a general neural network training approach that uses
a teacher model to guide the student model. Existing works mainly study KD from the …

Understanding and improving knowledge distillation

J Tang, R Shivanna, Z Zhao, D Lin, A Singh… - arXiv preprint arXiv …, 2020 - arxiv.org
Knowledge Distillation (KD) is a model-agnostic technique to improve model quality while
having a fixed capacity budget. It is a commonly used technique for model compression …

Explicit and implicit knowledge distillation via unlabeled data

Y Wang, Z Ge, Z Chen, X Liu, C Ma… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Data-free knowledge distillation is a challenging model lightweight task for scenarios in
which the original dataset is not available. Previous methods require a lot of extra …

On the orthogonality of knowledge distillation with other techniques: From an ensemble perspective

SU Park, KY Yoo, N Kwak - arXiv preprint arXiv:2009.04120, 2020 - arxiv.org
To put a state-of-the-art neural network to practical use, it is necessary to design a model
that has a good trade-off between the resource consumption and performance on the test …

Discovering and overcoming limitations of noise-engineered data-free knowledge distillation

P Raikwar, D Mishra - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Distillation in neural networks using only the samples randomly drawn from a Gaussian
distribution is possibly the most straightforward solution one can think of for the complex …

Teach less, learn more: On the undistillable classes in knowledge distillation

Y Zhu, N Liu, Z Xu, X Liu, W Meng… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Knowledge distillation (KD) can effectively compress neural networks by training a
smaller network (student) to simulate the behavior of a larger one (teacher). A counter …

ReffAKD: Resource-efficient Autoencoder-based Knowledge Distillation

D Doshi, JE Kim - arXiv preprint arXiv:2404.09886, 2024 - arxiv.org
In this research, we propose an innovative method to boost Knowledge Distillation efficiency
without the need for resource-heavy teacher models. Knowledge Distillation trains a …

CILDA: Contrastive data augmentation using intermediate layer knowledge distillation

MA Haidar, M Rezagholizadeh, A Ghaddar… - arXiv preprint arXiv …, 2022 - arxiv.org
Knowledge distillation (KD) is an efficient framework for compressing large-scale pre-trained
language models. Recent years have seen a surge of research aiming to improve KD by …