Adaptformer: Adapting vision transformers for scalable visual recognition

S Chen, C Ge, Z Tong, J Wang… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Pretraining Vision Transformers (ViTs) has achieved great success in visual
recognition. A following scenario is to adapt a ViT to various image and video recognition …

Imagenet-21k pretraining for the masses

T Ridnik, E Ben-Baruch, A Noy… - arXiv preprint arXiv …, 2021 - arxiv.org
ImageNet-1K serves as the primary dataset for pretraining deep learning models for
computer vision tasks. ImageNet-21K dataset, which is bigger and more diverse, is used …

Residual attention: A simple but effective method for multi-label recognition

K Zhu, J Wu - Proceedings of the IEEE/CVF international …, 2021 - openaccess.thecvf.com
Multi-label image recognition is a challenging computer vision task of practical use.
Progresses in this area, however, are often characterized by complicated methods, heavy …

Detecting and grounding multi-modal media manipulation

R Shao, T Wu, Z Liu - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Misinformation has become a pressing issue. Fake media, in both visual and textual forms,
is widespread on the web. While various deepfake detection and text fake news detection …

Injecting semantic concepts into end-to-end image captioning

Z Fang, J Wang, X Hu, L Liang, Z Gan… - Proceedings of the …, 2022 - openaccess.thecvf.com
Tremendous progress has been made in recent years in developing better image captioning
models, yet most of them rely on a separate object detector to extract regional features …

Ml-decoder: Scalable and versatile classification head

T Ridnik, G Sharir, A Ben-Cohen… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this paper, we introduce ML-Decoder, a new attention-based classification head. ML-
Decoder predicts the existence of class labels via queries, and enables better utilization of …

Re-labeling imagenet: from single to multi-labels, from global to localized labels

S Yun, SJ Oh, B Heo, D Han… - Proceedings of the …, 2021 - openaccess.thecvf.com
ImageNet has been the most popular image classification benchmark, but it is also the one
with a significant level of label noise. Recent studies have shown that many samples contain …

[PDF][PDF] MRN: A locally and globally mention-based reasoning network for document-level relation extraction

J Li, K Xu, F Li, H Fei, Y Ren, D Ji - Findings of the Association for …, 2021 - aclanthology.org
Document-level relation extraction aims to detect the relations within one document, which is
challenging since it requires complex reasoning using mentions, entities, local and global …

Cdul: Clip-driven unsupervised learning for multi-label image classification

R Abdelfattah, Q Guo, X Li, X Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
This paper presents a CLIP-based unsupervised learning method for annotation-free multi-
label image classification, including three stages: initialization, training, and inference. At the …

Triplet attention and dual-pool contrastive learning for clinic-driven multi-label medical image classification

Y Zhang, L Luo, Q Dou, PA Heng - Medical image analysis, 2023 - Elsevier
Multi-label classification (MLC) can attach multiple labels on single image, and has
achieved promising results on medical images. But existing MLC methods still face …