Places: A 10 million image database for scene recognition

A Jaiswal, AR Babu, MZ Zadeh, D Banerjee… - Technologies, 2020 - mdpi.com

Self-supervised learning has gained popularity because of its ability to avoid the cost of
annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as …

被引用次数：1248 相关文章所有 10 个版本

[PDF] springer.com

Machine learning and landslide studies: recent advances and applications

FS Tehrani, M Calvello, Z Liu, L Zhang, S Lacasse - Natural Hazards, 2022 - Springer

Upon the introduction of machine learning (ML) and its variants, in the form that we know
today, to the landslide community, many studies have been carried out to explore the …

被引用次数：107 相关文章所有 9 个版本

[PDF] thecvf.com

Segment anything

A Kirillov, E Mintun, N Ravi, H Mao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …

被引用次数：3997 相关文章所有 12 个版本

[PDF] mlr.press

Scaling vision transformers to 22 billion parameters

M Dehghani, J Djolonga, B Mustafa… - International …, 2023 - proceedings.mlr.press

The scaling of Transformers has driven breakthrough capabilities for language models. At
present, the largest large language models (LLMs) contain upwards of 100B parameters …

被引用次数：325 相关文章所有 9 个版本

[PDF] thecvf.com

Depth anything: Unleashing the power of large-scale unlabeled data

L Yang, B Kang, Z Huang, X Xu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract This work presents Depth Anything a highly practical solution for robust monocular
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …

被引用次数：97 相关文章所有 6 个版本

[PDF] openreview.net

Unified-io: A unified model for vision, language, and multi-modal tasks

J Lu, C Clark, R Zellers, R Mottaghi… - The Eleventh …, 2022 - openreview.net

We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical
computer vision tasks, including pose estimation, object detection, depth estimation and …

被引用次数：298 相关文章所有 3 个版本

[PDF] arxiv.org

Mmbench: Is your multi-modal model an all-around player?

Y Liu, H Duan, Y Zhang, B Li, S Zhang, W Zhao… - arXiv preprint arXiv …, 2023 - arxiv.org

Large vision-language models have recently achieved remarkable progress, exhibiting
great perception and reasoning abilities concerning visual information. However, how to …

被引用次数：267 相关文章所有 3 个版本

[PDF] thecvf.com

Scaling up your kernels to 31x31: Revisiting large kernel design in cnns

X Ding, X Zhang, J Han, G Ding - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by
recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few …

被引用次数：718 相关文章所有 10 个版本

[PDF] mlr.press

Out-of-distribution detection with deep nearest neighbors

Y Sun, Y Ming, X Zhu, Y Li - International Conference on …, 2022 - proceedings.mlr.press

Abstract Out-of-distribution (OOD) detection is a critical task for deploying machine learning
models in the open world. Distance-based methods have demonstrated promise, where …

被引用次数：338 相关文章所有 4 个版本

[PDF] thecvf.com

Conditional prompt learning for vision-language models

K Zhou, J Yang, CC Loy, Z Liu - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential
to investigate ways to adapt these models to downstream datasets. A recently proposed …

被引用次数：950 相关文章所有 7 个版本

高级搜索

QQ 群