Semantic image segmentation (SiS) plays a fundamental role in a broad variety of computer vision applications, providing key information for the global understanding of an image. This …
This paper proposes a new transformer-based framework to learn class-specific object localization maps as pseudo labels for weakly supervised semantic segmentation (WSSS) …
L Ru, Y Zhan, B Yu, B Du - … of the IEEE/CVF conference on …, 2022 - openaccess.thecvf.com
Weakly-supervised semantic segmentation (WSSS) with image-level labels is an important and challenging task. Due to the high training efficiency, end-to-end solutions for WSSS …
Z Peng, W Huang, S Gu, L Xie… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract Within Convolutional Neural Network (CNN), the convolution operations are good at extracting local features but experience difficulty to capture global representations. Within …
L Ru, H Zheng, Y Zhan, B Du - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Weakly-Supervised Semantic Segmentation (WSSS) using image-level labels typically utilizes Class Activation Map (CAM) to generate the pseudo labels. Limited by the …
K Chen, X Du, B Zhu, Z Ma… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Audio classification is an important task of mapping audio samples into their corresponding labels. Recently, the transformer model with self-attention mechanisms has been adopted in …
Mixup-based augmentation has been found to be effective for generalizing models during training, especially for Vision Transformers (ViTs) since they can easily overfit. However …
Contrastive Language-Image Pre-training (CLIP) is a powerful multimodal large vision model that has demonstrated significant benefits for downstream tasks, including many zero …