Mosaicking to distill: Knowledge distillation from out-of-domain data

G Fang, Y Bao, J Song, X Wang, D Xie… - Advances in …, 2021 - proceedings.neurips.cc
Abstract Knowledge distillation~(KD) aims to craft a compact student model that imitates the
behavior of a pre-trained teacher in a target domain. Prior KD approaches, despite their
gratifying results, have largely relied on the premise that\emph {in-domain} data is available
to carry out the knowledge transfer. Such an assumption, unfortunately, in many cases
violates the practical setting, since the original training data or even the data domain is often
unreachable due to privacy or copyright reasons. In this paper, we attempt to tackle an …

[PDF][PDF] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data–Supplementary Material–

G Fang, Y Bao, J Song, X Wang, D Xie, C Shen… - proceedings.neurips.cc
In this work, we deploy a generator to synthesize the transfer set for knowledge distillation.
Nevertheless, GANs are known to suffer from mode collapse and fail to produce diverse
patterns. To this end, we leverage both OOD data and synthetic ones to train our student
models, so that the generator does not need to synthesize all samples for KD. Besides, an
additional balance loss is deployed to alleviate mode collapse during training, defined as:
以上显示的是最相近的搜索结果。 查看全部搜索结果