Localizing objects with self-supervised transformers and no labels

X Wang, R Girdhar, SX Yu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract We propose Cut-and-LEaRn (CutLER), a simple approach for training
unsupervised object detection and segmentation models. We leverage the property of self …

被引用次数：177 相关文章所有 5 个版本

[PDF] thecvf.com

Scaling vision transformers to gigapixel images via hierarchical self-supervised learning

RJ Chen, C Chen, Y Li, TY Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract Vision Transformers (ViTs) and their multi-scale and hierarchical variations have
been successful at capturing image representations but their use has been generally …

被引用次数：455 相关文章所有 6 个版本

[PDF] arxiv.org

Vision transformers need registers

T Darcet, M Oquab, J Mairal, P Bojanowski - arXiv preprint arXiv …, 2023 - arxiv.org

Transformers have recently emerged as a powerful tool for learning visual representations.
In this paper, we identify and characterize artifacts in feature maps of both supervised and …

被引用次数：234 相关文章所有 10 个版本

[PDF] ieee.org

Transformer-based visual segmentation: A survey

X Li, H Ding, H Yuan, W Zhang, J Pang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …

被引用次数：107 相关文章所有 3 个版本

[PDF] thecvf.com

Deep spectral methods: A surprisingly strong baseline for unsupervised semantic segmentation and localization

L Melas-Kyriazi, C Rupprecht… - Proceedings of the …, 2022 - openaccess.thecvf.com

Unsupervised localization and segmentation are long-standing computer vision challenges
that involve decomposing an image into semantically-meaningful segments without any …

被引用次数：166 相关文章所有 10 个版本

[PDF] arxiv.org

Neural feature fusion fields: 3d distillation of self-supervised 2d image representations

V Tschernezki, I Laina, D Larlus… - … Conference on 3D …, 2022 - ieeexplore.ieee.org

We present Neural Feature Fusion Fields (N3F),\a method that improves dense 2D image
feature extractors when the latter are applied to the analysis of multiple images …

被引用次数：167 相关文章所有 10 个版本

[PDF] github.io

[PDF][PDF] Deep vit features as dense visual descriptors

S Amir, Y Gandelsman, S Bagon… - arXiv preprint arXiv …, 2021 - dino-vit-features.github.io

We study the use of deep features extracted from a pretrained Vision Transformer (ViT) as
dense visual descriptors. We observe and empirically demonstrate that such features, when …

被引用次数：272 相关文章所有 3 个版本

[PDF] arxiv.org

Bridging the gap to real-world object-centric learning

M Seitzer, M Horn, A Zadaianchuk, D Zietlow… - arXiv preprint arXiv …, 2022 - arxiv.org

Humans naturally decompose their environment into entities at the appropriate level of
abstraction to act in the world. Allowing machine learning algorithms to derive this …

被引用次数：128 相关文章所有 8 个版本

[PDF] thecvf.com

Freesolo: Learning to segment objects without annotations

X Wang, Z Yu, S De Mello, J Kautz… - Proceedings of the …, 2022 - openaccess.thecvf.com

Instance segmentation is a fundamental vision task that aims to recognize and segment
each object in an image. However, it requires costly annotations such as bounding boxes …

被引用次数：130 相关文章所有 9 个版本

[PDF] arxiv.org

Exploiting unlabeled data with vision and language models for object detection

S Zhao, Z Zhang, S Schulter, L Zhao… - European conference on …, 2022 - Springer

Building robust and generic object detection frameworks requires scaling to larger label
spaces and bigger training datasets. However, it is prohibitively costly to acquire annotations …

被引用次数：101 相关文章所有 8 个版本

高级搜索

QQ 群