Weakly supervised object localization and detection: A survey

D Zhang, J Han, G Cheng… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
As an emerging and challenging problem in the computer vision community, weakly
supervised object localization and detection plays an important role for developing new …

Semantic image segmentation: Two decades of research

G Csurka, R Volpi, B Chidlovskii - Foundations and Trends® …, 2022 - nowpublishers.com
Semantic image segmentation (SiS) plays a fundamental role in a broad variety of computer
vision applications, providing key information for the global understanding of an image. This …

Multi-class token transformer for weakly supervised semantic segmentation

L Xu, W Ouyang, M Bennamoun… - Proceedings of the …, 2022 - openaccess.thecvf.com
This paper proposes a new transformer-based framework to learn class-specific object
localization maps as pseudo labels for weakly supervised semantic segmentation (WSSS) …

Learning affinity from attention: End-to-end weakly-supervised semantic segmentation with transformers

L Ru, Y Zhan, B Yu, B Du - … of the IEEE/CVF conference on …, 2022 - openaccess.thecvf.com
Weakly-supervised semantic segmentation (WSSS) with image-level labels is an important
and challenging task. Due to the high training efficiency, end-to-end solutions for WSSS …

Conformer: Local features coupling global representations for visual recognition

Z Peng, W Huang, S Gu, L Xie… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract Within Convolutional Neural Network (CNN), the convolution operations are good
at extracting local features but experience difficulty to capture global representations. Within …

Token contrast for weakly-supervised semantic segmentation

L Ru, H Zheng, Y Zhan, B Du - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Weakly-Supervised Semantic Segmentation (WSSS) using image-level labels
typically utilizes Class Activation Map (CAM) to generate the pseudo labels. Limited by the …

Hts-at: A hierarchical token-semantic audio transformer for sound classification and detection

K Chen, X Du, B Zhu, Z Ma… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Audio classification is an important task of mapping audio samples into their corresponding
labels. Recently, the transformer model with self-attention mechanisms has been adopted in …

Generative prompt model for weakly supervised object localization

Y Zhao, Q Ye, W Wu, C Shen… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Weakly supervised object localization (WSOL) remains challenging when learning object
localization models from image category labels. Conventional methods that discriminatively …

Transmix: Attend to mix for vision transformers

JN Chen, S Sun, J He, PHS Torr… - Proceedings of the …, 2022 - openaccess.thecvf.com
Mixup-based augmentation has been found to be effective for generalizing models during
training, especially for Vision Transformers (ViTs) since they can easily overfit. However …

Clip surgery for better explainability with enhancement in open-vocabulary tasks

Y Li, H Wang, Y Duan, X Li - arXiv preprint arXiv:2304.05653, 2023 - arxiv.org
Contrastive Language-Image Pre-training (CLIP) is a powerful multimodal large vision
model that has demonstrated significant benefits for downstream tasks, including many zero …