A survey on contrastive self-supervised learning

A Jaiswal, AR Babu, MZ Zadeh, D Banerjee… - Technologies, 2020 - mdpi.com
Self-supervised learning has gained popularity because of its ability to avoid the cost of
annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as …

Machine learning and landslide studies: recent advances and applications

FS Tehrani, M Calvello, Z Liu, L Zhang, S Lacasse - Natural Hazards, 2022 - Springer
Upon the introduction of machine learning (ML) and its variants, in the form that we know
today, to the landslide community, many studies have been carried out to explore the …

Segment anything

A Kirillov, E Mintun, N Ravi, H Mao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …

Scaling vision transformers to 22 billion parameters

M Dehghani, J Djolonga, B Mustafa… - International …, 2023 - proceedings.mlr.press
The scaling of Transformers has driven breakthrough capabilities for language models. At
present, the largest large language models (LLMs) contain upwards of 100B parameters …

Depth anything: Unleashing the power of large-scale unlabeled data

L Yang, B Kang, Z Huang, X Xu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract This work presents Depth Anything a highly practical solution for robust monocular
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …

Unified-io: A unified model for vision, language, and multi-modal tasks

J Lu, C Clark, R Zellers, R Mottaghi… - The Eleventh …, 2022 - openreview.net
We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical
computer vision tasks, including pose estimation, object detection, depth estimation and …

Mmbench: Is your multi-modal model an all-around player?

Y Liu, H Duan, Y Zhang, B Li, S Zhang, W Zhao… - arXiv preprint arXiv …, 2023 - arxiv.org
Large vision-language models have recently achieved remarkable progress, exhibiting
great perception and reasoning abilities concerning visual information. However, how to …

Scaling up your kernels to 31x31: Revisiting large kernel design in cnns

X Ding, X Zhang, J Han, G Ding - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by
recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few …

Out-of-distribution detection with deep nearest neighbors

Y Sun, Y Ming, X Zhu, Y Li - International Conference on …, 2022 - proceedings.mlr.press
Abstract Out-of-distribution (OOD) detection is a critical task for deploying machine learning
models in the open world. Distance-based methods have demonstrated promise, where …

Conditional prompt learning for vision-language models

K Zhou, J Yang, CC Loy, Z Liu - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential
to investigate ways to adapt these models to downstream datasets. A recently proposed …