Openmmlab’s image classification toolbox and benchmark

MH Guo, CZ Lu, ZN Liu, MM Cheng, SM Hu - Computational Visual Media, 2023 - Springer

While originally designed for natural language processing tasks, the self-attention
mechanism has recently taken various computer vision areas by storm. However, the 2D …

被引用次数：544 相关文章所有 8 个版本

[PDF] arxiv.org

Masked generative distillation

Z Yang, Z Li, M Shao, D Shi, Z Yuan, C Yuan - European Conference on …, 2022 - Springer

Abstract Knowledge distillation has been applied to various tasks successfully. The current
distillation algorithm usually improves students' performance by imitating the output of the …

被引用次数：142 相关文章所有 5 个版本

[PDF] arxiv.org

Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models

X Xie, P Zhou, H Li, Z Lin, S Yan - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

In deep learning, different kinds of deep networks typically need different optimizers, which
have to be chosen after multiple trials, making the training process inefficient. To relieve this …

被引用次数：89 相关文章所有 4 个版本

[PDF] thecvf.com

From knowledge distillation to self-knowledge distillation: A unified approach with normalized loss and customized soft labels

Z Yang, A Zeng, Z Li, T Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Knowledge Distillation (KD) uses the teacher's prediction logits as soft labels to
guide the student, while self-KD does not need a real teacher to require the soft labels. This …

被引用次数：41 相关文章所有 5 个版本

[PDF] arxiv.org

Vitkd: Practical guidelines for vit feature knowledge distillation

Z Yang, Z Li, A Zeng, Z Li, C Yuan, Y Li - arXiv preprint arXiv:2209.02432, 2022 - arxiv.org

Knowledge Distillation (KD) for Convolutional Neural Network (CNN) is extensively studied
as a way to boost the performance of a small model. Recently, Vision Transformer (ViT) has …

被引用次数：41 相关文章所有 3 个版本

[PDF] thecvf.com

Bit-shrinking: Limiting instantaneous sharpness for improving post-training quantization

C Lin, B Peng, Z Li, W Tan, Y Ren… - Proceedings of the …, 2023 - openaccess.thecvf.com

Post-training quantization (PTQ) is an effective compression method to reduce the model
size and computational cost. However, quantizing a model into a low-bit one, eg, lower than …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

Efficientclip: Efficient cross-modal pre-training by ensemble confident learning and language modeling

J Wang, H Wang, J Deng, W Wu, D Zhang - arXiv preprint arXiv …, 2021 - arxiv.org

While large scale pre-training has achieved great achievements in bridging the gap
between vision and language, it still faces several challenges. First, the cost for pre-training …

被引用次数：36 相关文章所有 5 个版本

[PDF] thecvf.com

Regularization of polynomial networks for image recognition

GG Chrysos, B Wang, J Deng… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Deep Neural Networks (DNNs) have obtained impressive performance across
tasks, however they still remain as black boxes, eg, hard to theoretically analyze. At the …

被引用次数：7 相关文章所有 11 个版本

[PDF] arxiv.org

Openmixup: Open mixup toolbox and benchmark for visual representation learning

S Li, Z Wang, Z Liu, D Wu, SZ Li - arXiv preprint arXiv:2209.04851, 2022 - arxiv.org

With the remarkable progress of deep neural networks in computer vision, data mixing
augmentation techniques are widely studied to alleviate problems of degraded …

被引用次数：23 相关文章所有 3 个版本

[PDF] mlr.press

Detecting out-of-distribution data through in-distribution class prior

X Jiang, F Liu, Z Fang, H Chen, T Liu… - International …, 2023 - proceedings.mlr.press

Given a pre-trained in-distribution (ID) model, the inference-time out-of-distribution (OOD)
detection aims to recognize OOD data during the inference stage. However, some …

被引用次数：8 相关文章所有 7 个版本

高级搜索

QQ 群