The caltech-ucsd birds-200-2011 dataset

Y Song, T Wang, P Cai, SK Mondal… - ACM Computing Surveys, 2023 - dl.acm.org

Few-shot learning (FSL) has emerged as an effective learning method and shows great
potential. Despite the recent creative works in tackling FSL tasks, learning valid information …

被引用次数：187 相关文章所有 3 个版本

[PDF] arxiv.org

Weakly supervised object localization and detection: A survey

D Zhang, J Han, G Cheng… - IEEE transactions on …, 2021 - ieeexplore.ieee.org

As an emerging and challenging problem in the computer vision community, weakly
supervised object localization and detection plays an important role for developing new …

被引用次数：282 相关文章所有 9 个版本

[PDF] mlr.press

Scaling vision transformers to 22 billion parameters

M Dehghani, J Djolonga, B Mustafa… - International …, 2023 - proceedings.mlr.press

The scaling of Transformers has driven breakthrough capabilities for language models. At
present, the largest large language models (LLMs) contain upwards of 100B parameters …

被引用次数：320 相关文章所有 9 个版本

[PDF] arxiv.org

Visual prompt tuning

M Jia, L Tang, BC Chen, C Cardie, S Belongie… - … on Computer Vision, 2022 - Springer

The current modus operandi in adapting pre-trained models involves updating all the
backbone parameters, ie., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) …

被引用次数：1045 相关文章所有 7 个版本

[PDF] arxiv.org

Slip: Self-supervision meets language-image pre-training

N Mu, A Kirillov, D Wagner, S Xie - European conference on computer …, 2022 - Springer

Recent work has shown that self-supervised pre-training leads to improvements over
supervised learning on challenging visual recognition tasks. CLIP, an exciting new …

被引用次数：351 相关文章所有 9 个版本

[PDF] thecvf.com

Conformer: Local features coupling global representations for visual recognition

Z Peng, W Huang, S Gu, L Xie… - Proceedings of the …, 2021 - openaccess.thecvf.com

Abstract Within Convolutional Neural Network (CNN), the convolution operations are good
at extracting local features but experience difficulty to capture global representations. Within …

被引用次数：570 相关文章所有 8 个版本

[PDF] thecvf.com

Deep spectral methods: A surprisingly strong baseline for unsupervised semantic segmentation and localization

L Melas-Kyriazi, C Rupprecht… - Proceedings of the …, 2022 - openaccess.thecvf.com

Unsupervised localization and segmentation are long-standing computer vision challenges
that involve decomposing an image into semantically-meaningful segments without any …

被引用次数：120 相关文章所有 10 个版本

[PDF] projecteuclid.org

Interpretable machine learning: Fundamental principles and 10 grand challenges

C Rudin, C Chen, Z Chen, H Huang… - Statistic …, 2022 - projecteuclid.org

Interpretability in machine learning (ML) is crucial for high stakes decisions and
troubleshooting. In this work, we provide fundamental principles for interpretable ML, and …

被引用次数：645 相关文章所有 7 个版本

[PDF] neurips.cc

Delving into out-of-distribution detection with vision-language representations

Y Ming, Z Cai, J Gu, Y Sun, W Li… - Advances in neural …, 2022 - proceedings.neurips.cc

Recognizing out-of-distribution (OOD) samples is critical for machine learning systems
deployed in the open world. The vast majority of OOD detection methods are driven by a …

被引用次数：98 相关文章所有 4 个版本

[PDF] thecvf.com

Towards language-free training for text-to-image generation

Y Zhou, R Zhang, C Chen, C Li… - Proceedings of the …, 2022 - openaccess.thecvf.com

One of the major challenges in training text-to-image generation models is the need of a
large number of high-quality text-image pairs. While image samples are often easily …

被引用次数：213 相关文章所有 8 个版本

高级搜索

QQ 群