Attention distillation: self-supervised vision transformer students need more guidance

C Yang, X Yu, Z An, Y Xu - … Distillation: Towards New Horizons of Intelligent …, 2023 - Springer

Deep neural networks have achieved remarkable performance for artificial intelligence
tasks. The success behind intelligent systems often relies on large-scale models with high …

被引用次数：28 相关文章所有 5 个版本

[PDF] arxiv.org

A survey of the self supervised learning mechanisms for vision transformers

A Khan, A Sohail, M Fiaz, M Hassan, TH Afridi… - arXiv preprint arXiv …, 2024 - arxiv.org

Deep supervised learning models require high volume of labeled data to attain sufficiently
good results. Although, the practice of gathering and annotating such big data is costly and …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Maskedkd: Efficient distillation of vision transformers with masked images

S Son, N Lee, J Lee - arXiv preprint arXiv:2302.10494, 2023 - arxiv.org

Knowledge distillation is an effective method for training lightweight models, but it introduces
a significant amount of computational overhead to the training cost, as the method requires …

被引用次数：4 相关文章所有 3 个版本

[PDF] ecva.net

被引用次数：1 相关文章所有 2 个版本

Exemplar-Free Continual Learning in Vision Transformers via Feature Attention Distillation

X Dai, J Cheng, Z Wei, B Du - 2023 IEEE International …, 2023 - ieeexplore.ieee.org

In this paper, we propose a new approach for continual learning based on the Visual
Transformers (ViTs). The purpose of continual learning is to address the catastrophic …

高级搜索

QQ 群