[HTML][HTML] Review of large vision models and visual prompt engineering

J Wang, Z Liu, L Zhao, Z Wu, C Ma, S Yu, H Dai… - Meta-Radiology, 2023 - Elsevier
Visual prompt engineering is a fundamental methodology in the field of visual and image
artificial general intelligence. As the development of large vision models progresses, the …

Visual tuning

BXB Yu, J Chang, H Wang, L Liu, S Wang… - ACM Computing …, 2023 - dl.acm.org
Fine-tuning visual models has been widely shown promising performance on many
downstream visual tasks. With the surprising development of pre-trained visual foundation …

Scconv: Spatial and channel reconstruction convolution for feature redundancy

J Li, Y Wen, L He - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Abstract Convolutional Neural Networks (CNNs) have achieved remarkable performance in
various computer vision tasks but this comes at the cost of tremendous computational …

Effective whole-body pose estimation with two-stages distillation

Z Yang, A Zeng, C Yuan, Y Li - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Whole-body pose estimation localizes the human body, hand, face, and foot keypoints in an
image. This task is challenging due to multi-scale body parts, fine-grained localization for …

Efficientsam: Leveraged masked image pretraining for efficient segment anything

Y Xiong, B Varadarajan, L Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Segment Anything Model (SAM) has emerged as a powerful tool for numerous
vision applications. A key component that drives the impressive performance for zero-shot …

Decoupled multimodal distilling for emotion recognition

Y Li, Y Wang, Z Cui - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Human multimodal emotion recognition (MER) aims to perceive human emotions via
language, visual and acoustic modalities. Despite the impressive performance of previous …

A survey on model compression for large language models

X Zhu, J Li, Y Liu, C Ma, W Wang - arXiv preprint arXiv:2308.07633, 2023 - arxiv.org
Large Language Models (LLMs) have revolutionized natural language processing tasks with
remarkable success. However, their formidable size and computational demands present …

Multi-level logit distillation

Y Jin, J Wang, D Lin - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Abstract Knowledge Distillation (KD) aims at distilling the knowledge from the large teacher
model to a lightweight student model. Mainstream KD methods can be divided into two …

Curriculum temperature for knowledge distillation

Z Li, X Li, L Yang, B Zhao, R Song, L Luo, J Li… - Proceedings of the …, 2023 - ojs.aaai.org
Most existing distillation methods ignore the flexible role of the temperature in the loss
function and fix it as a hyper-parameter that can be decided by an inefficient grid search. In …

From knowledge distillation to self-knowledge distillation: A unified approach with normalized loss and customized soft labels

Z Yang, A Zeng, Z Li, T Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Knowledge Distillation (KD) uses the teacher's prediction logits as soft labels to
guide the student, while self-KD does not need a real teacher to require the soft labels. This …