[HTML][HTML] Visual attention network

MH Guo, CZ Lu, ZN Liu, MM Cheng, SM Hu - Computational Visual Media, 2023 - Springer
While originally designed for natural language processing tasks, the self-attention
mechanism has recently taken various computer vision areas by storm. However, the 2D …

Masked generative distillation

Z Yang, Z Li, M Shao, D Shi, Z Yuan, C Yuan - European Conference on …, 2022 - Springer
Abstract Knowledge distillation has been applied to various tasks successfully. The current
distillation algorithm usually improves students' performance by imitating the output of the …

Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models

X Xie, P Zhou, H Li, Z Lin, S Yan - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
In deep learning, different kinds of deep networks typically need different optimizers, which
have to be chosen after multiple trials, making the training process inefficient. To relieve this …

From knowledge distillation to self-knowledge distillation: A unified approach with normalized loss and customized soft labels

Z Yang, A Zeng, Z Li, T Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Knowledge Distillation (KD) uses the teacher's prediction logits as soft labels to
guide the student, while self-KD does not need a real teacher to require the soft labels. This …

Vitkd: Practical guidelines for vit feature knowledge distillation

Z Yang, Z Li, A Zeng, Z Li, C Yuan, Y Li - arXiv preprint arXiv:2209.02432, 2022 - arxiv.org
Knowledge Distillation (KD) for Convolutional Neural Network (CNN) is extensively studied
as a way to boost the performance of a small model. Recently, Vision Transformer (ViT) has …

Bit-shrinking: Limiting instantaneous sharpness for improving post-training quantization

C Lin, B Peng, Z Li, W Tan, Y Ren… - Proceedings of the …, 2023 - openaccess.thecvf.com
Post-training quantization (PTQ) is an effective compression method to reduce the model
size and computational cost. However, quantizing a model into a low-bit one, eg, lower than …

Efficientclip: Efficient cross-modal pre-training by ensemble confident learning and language modeling

J Wang, H Wang, J Deng, W Wu, D Zhang - arXiv preprint arXiv …, 2021 - arxiv.org
While large scale pre-training has achieved great achievements in bridging the gap
between vision and language, it still faces several challenges. First, the cost for pre-training …

Regularization of polynomial networks for image recognition

GG Chrysos, B Wang, J Deng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Deep Neural Networks (DNNs) have obtained impressive performance across
tasks, however they still remain as black boxes, eg, hard to theoretically analyze. At the …

Openmixup: Open mixup toolbox and benchmark for visual representation learning

S Li, Z Wang, Z Liu, D Wu, SZ Li - arXiv preprint arXiv:2209.04851, 2022 - arxiv.org
With the remarkable progress of deep neural networks in computer vision, data mixing
augmentation techniques are widely studied to alleviate problems of degraded …

Detecting out-of-distribution data through in-distribution class prior

X Jiang, F Liu, Z Fang, H Chen, T Liu… - International …, 2023 - proceedings.mlr.press
Given a pre-trained in-distribution (ID) model, the inference-time out-of-distribution (OOD)
detection aims to recognize OOD data during the inference stage. However, some …