Teacher-free distillation via regularizing intermediate representation

P Dong, L Li, Z Wei - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

Abstract Knowledge distillation (KD) is an effective training strategy to improve the
lightweight student models under the guidance of cumbersome teachers. However, the large …

被引用次数：65 相关文章所有 9 个版本

[PDF] thecvf.com

Automated knowledge distillation via monte carlo tree search

L Li, P Dong, Z Wei, Y Yang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

In this paper, we present Auto-KD, the first automated search framework for optimal
knowledge distillation design. Traditional distillation techniques typically require handcrafted …

被引用次数：38 相关文章所有 3 个版本

[PDF] neurips.cc

Shadow knowledge distillation: Bridging offline and online knowledge transfer

L Li, Z Jin - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc

Abstract Knowledge distillation can be generally divided into offline and online categories
according to whether teacher model is pre-trained and persistent during the distillation …

被引用次数：60 相关文章所有 5 个版本

[PDF] neurips.cc

Kd-zero: Evolving knowledge distiller for any teacher-student pairs

L Li, P Dong, A Li, Z Wei… - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract Knowledge distillation (KD) has emerged as an effective technique for compressing
models that can enhance the lightweight model. Conventional KD methods propose various …

被引用次数：28 相关文章所有 3 个版本

[PDF] ecva.net

Self-regulated feature learning via teacher-free feature distillation

L Li - European Conference on Computer Vision, 2022 - Springer

Abstract Knowledge distillation conditioned on intermediate feature representations always
leads to significant performance improvements. Conventional feature distillation framework …

被引用次数：65 相关文章所有 3 个版本

[PDF] thecvf.com

Emq: Evolving training-free proxies for automated mixed precision quantization

P Dong, L Li, Z Wei, X Niu, Z Tian… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Mixed-Precision Quantization (MQ) can achieve a competitive accuracy-complexity
trade-off for models. Conventional training-based search methods require time-consuming …

被引用次数：34 相关文章所有 6 个版本

[PDF] aaai.org

Auto-prox: Training-free vision transformer architecture search via automatic proxy discovery

Z Wei, P Dong, Z Hui, A Li, L Li, M Lu, H Pan… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

The substantial success of Vision Transformer (ViT) in computer vision tasks is largely
attributed to the architecture design. This underscores the necessity of efficient architecture …

被引用次数：26 相关文章所有 4 个版本

[PDF] aaai.org

Saswot: Real-time semantic segmentation architecture search without training

C Zhu, L Li, Y Wu, Z Sun - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org

In this paper, we present SasWOT, the first training-free Semantic segmentation Architecture
Search (SAS) framework via an auto-discovery proxy. Semantic segmentation is widely used …

被引用次数：19 相关文章所有 2 个版本

[PDF] arxiv.org

Tvt: Training-free vision transformer search on tiny datasets

Z Wei, H Pan, L Li, P Dong, D Li - International Conference on Pattern …, 2024 - Springer

Abstract Training-free Vision Transformer (ViT) architecture search is presented to search for
a better ViT with zero-cost proxies. While ViTs achieve significant distillation gains from CNN …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

GP-NAS-ensemble: a model for NAS Performance Prediction

K Chen, L Yang, Y Chen, K Chen, Y Xu, L Li - arXiv preprint arXiv …, 2023 - arxiv.org

It is of great significance to estimate the performance of a given model architecture without
training in the application of Neural Architecture Search (NAS) as it may take a lot of time to …

被引用次数：19 相关文章所有 4 个版本

高级搜索

QQ 群