Image data augmentation approaches: A comprehensive survey and future directions

T Kumar, A Mileo, R Brennan… - arXiv preprint arXiv …, 2023 - arxiv.org
Deep learning (DL) algorithms have shown significant performance in various computer
vision tasks. However, having limited labelled data lead to a network overfitting problem …

Mobileclip: Fast image-text models through multi-modal reinforced training

PKA Vasu, H Pouransari, F Faghri… - Proceedings of the …, 2024 - openaccess.thecvf.com
Contrastive pre-training of image-text foundation models such as CLIP demonstrated
excellent zero-shot performance and improved robustness on a wide range of downstream …

On the Efficacy of Multi-scale Data Samplers for Vision Applications

E Nunez, T Merth, A Prabhu, M Farajtabar… - arXiv preprint arXiv …, 2023 - arxiv.org
Multi-scale resolution training has seen an increased adoption across multiple vision tasks,
including classification and detection. Training with smaller resolutions enables faster …

CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement

M Salehi, M Farajtabar, M Horton, F Faghri… - arXiv preprint arXiv …, 2023 - arxiv.org
Contrastive language image pretraining (CLIP) is a standard method for training vision-
language models. While CLIP is scalable, promptable, and robust to distribution shifts on …

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7 x Faster Pre-training on Web-scale Image-Text Data

S Mehta, M Horton, F Faghri, MH Sekhavat… - arXiv preprint arXiv …, 2024 - arxiv.org
Contrastive learning has emerged as a transformative method for learning effective visual
representations through the alignment of image and text embeddings. However, pairwise …

Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models

R Vemulapalli, H Pouransari, F Faghri, S Mehta… - Forty-first International … - openreview.net
Vision Foundation Models (VFMs) pretrained on massive datasets exhibit impressive
performance on various downstream tasks, especially with limited labeled target data …