End-to-end semi-supervised object detection with soft teacher

AB Amjoud, M Amrouch - IEEE Access, 2023 - ieeexplore.ieee.org

Detecting objects remains one of computer vision and image understanding applications'
most fundamental and challenging aspects. Significant advances in object detection have …

被引用次数：94 相关文章所有 2 个版本

Recent advances on loss functions in deep learning for computer vision

Y Tian, D Su, S Lauria, X Liu - Neurocomputing, 2022 - Elsevier

The loss function, also known as cost function, is used for training a neural network or other
machine learning models. Over the past decade, researchers have designed many loss …

被引用次数：76 相关文章所有 2 个版本

[PDF] arxiv.org

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

S Liu, Z Zeng, T Ren, F Li, H Zhang, J Yang… - arXiv preprint arXiv …, 2023 - arxiv.org

In this paper, we present an open-set object detector, called Grounding DINO, by marrying
Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …

被引用次数：997 相关文章所有 4 个版本

[PDF] thecvf.com

Image as a foreign language: Beit pretraining for vision and vision-language tasks

W Wang, H Bao, L Dong, J Bjorck… - Proceedings of the …, 2023 - openaccess.thecvf.com

A big convergence of language, vision, and multimodal pretraining is emerging. In this work,
we introduce a general-purpose multimodal foundation model BEiT-3, which achieves …

被引用次数：387 相关文章所有 5 个版本

[PDF] thecvf.com

Internimage: Exploring large-scale vision foundation models with deformable convolutions

W Wang, J Dai, Z Chen, Z Huang, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …

被引用次数：578 相关文章所有 8 个版本

[PDF] thecvf.com

Eva: Exploring the limits of masked visual representation learning at scale

Y Fang, W Wang, B Xie, Q Sun, L Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com

We launch EVA, a vision-centric foundation model to explore the limits of visual
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …

被引用次数：526 相关文章所有 5 个版本

[PDF] thecvf.com

Depth anything: Unleashing the power of large-scale unlabeled data

L Yang, B Kang, Z Huang, X Xu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract This work presents Depth Anything a highly practical solution for robust monocular
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …

被引用次数：233 相关文章所有 6 个版本

[PDF] openreview.net

Dino: Detr with improved denoising anchor boxes for end-to-end object detection

H Zhang, F Li, S Liu, L Zhang, H Su, J Zhu… - arXiv preprint arXiv …, 2022 - arxiv.org

We present DINO (\textbf {D} ETR with\textbf {I} mproved de\textbf {N} oising anch\textbf {O} r
boxes), a state-of-the-art end-to-end object detector.% in this paper. DINO improves over …

被引用次数：1131 相关文章所有 3 个版本

[PDF] thecvf.com

Mask dino: Towards a unified transformer-based framework for object detection and segmentation

F Li, H Zhang, H Xu, S Liu, L Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

In this paper we present Mask DINO, a unified object detection and segmentation
framework. Mask DINO extends DINO (DETR with Improved Denoising Anchor Boxes) by …

被引用次数：299 相关文章所有 5 个版本

[PDF] arxiv.org

Detecting twenty-thousand classes using image-level supervision

X Zhou, R Girdhar, A Joulin, P Krähenbühl… - European Conference on …, 2022 - Springer

Current object detectors are limited in vocabulary size due to the small scale of detection
datasets. Image classifiers, on the other hand, reason about much larger vocabularies, as …

被引用次数：516 相关文章所有 8 个版本

高级搜索

QQ 群