Spot-adaptive knowledge distillation

J Song, Y Chen, J Ye, M Song - IEEE Transactions on Image …, 2022 - ieeexplore.ieee.org
Knowledge distillation (KD) has become a well established paradigm for compressing deep
neural networks. The typical way of conducting knowledge distillation is to train the student …

Up to 100x faster data-free knowledge distillation

G Fang, K Mo, X Wang, J Song, S Bei… - Proceedings of the …, 2022 - ojs.aaai.org
Data-free knowledge distillation (DFKD) has recently been attracting increasing attention
from research communities, attributed to its capability to compress a model only using …

Data-free knowledge transfer: A survey

Y Liu, W Zhang, J Wang, J Wang - arXiv preprint arXiv:2112.15278, 2021 - arxiv.org
In the last decade, many deep learning models have been well trained and made a great
success in various fields of machine intelligence, especially for computer vision and natural …

Connective prediction for implicit discourse relation recognition via knowledge distillation

H Wu, H Zhou, M Lan, Y Wu… - Proceedings of the 61st …, 2023 - aclanthology.org
Implicit discourse relation recognition (IDRR) remains a challenging task in discourse
analysis due to the absence of connectives. Most existing methods utilize one-hot labels as …

Are large kernels better teachers than transformers for convnets?

T Huang, L Yin, Z Zhang, L Shen… - International …, 2023 - proceedings.mlr.press
This paper reveals a new appeal of the recently emerged large-kernel Convolutional Neural
Networks (ConvNets): as the teacher in Knowledge Distillation (KD) for small-kernel …

Momentum adversarial distillation: Handling large distribution shifts in data-free knowledge distillation

K Do, TH Le, D Nguyen, D Nguyen… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Data-free Knowledge Distillation (DFKD) has attracted attention recently thanks to
its appealing capability of transferring knowledge from a teacher network to a student …

Towards anytime fine-tuning: Continually pre-trained language models with hypernetwork prompt

G Jiang, C Jiang, S Xue, JY Zhang, J Zhou… - arXiv preprint arXiv …, 2023 - arxiv.org
Continual pre-training has been urgent for adapting a pre-trained model to a multitude of
domains and tasks in the fast-evolving world. In practice, a continually pre-trained model is …

On the effectiveness of out-of-distribution data in self-supervised long-tail learning

J Bai, Z Liu, H Wang, J Hao, Y Feng, H Chu… - arXiv preprint arXiv …, 2023 - arxiv.org
Though Self-supervised learning (SSL) has been widely studied as a promising technique
for representation learning, it doesn't generalize well on long-tailed datasets due to the …

Ideal: Query-efficient data-free learning from black-box models

J Zhang, C Chen, L Lyu - The Eleventh International Conference on …, 2022 - openreview.net
Knowledge Distillation (KD) is a typical method for training a lightweight student model with
the help of a well-trained teacher model. However, most KD methods require access to …

Building Variable-Sized Models via Learngene Pool

B Shi, S Xia, X Yang, H Chen, Z Kou… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Abstract Recently, Stitchable Neural Networks (SN-Net) is proposed to stitch some pre-
trained networks for quickly building numerous networks with different complexity and …