相关文章- 学术资源搜索

Reproducible scaling laws for contrastive language-image learning

M Cherti, R Beaumont, R Wightman… - Proceedings of the …, 2023 - openaccess.thecvf.com

Scaling up neural networks has led to remarkable performance across a wide range of
tasks. Moreover, performance often follows reliable scaling laws as a function of training set …

被引用次数：404 相关文章所有 6 个版本

[PDF] thecvf.com

Sigmoid loss for language image pre-training

X Zhai, B Mustafa, A Kolesnikov… - Proceedings of the …, 2023 - openaccess.thecvf.com

We propose a simple pairwise sigmoid loss for image-text pre-training. Unlike standard
contrastive learning with softmax normalization, the sigmoid loss operates solely on image …

被引用次数：195 相关文章所有 5 个版本

[PDF] thecvf.com

Non-contrastive learning meets language-image pre-training

J Zhou, L Dong, Z Gan, L Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Contrastive language-image pre-training (CLIP) serves as a de-facto standard to align
images and texts. Nonetheless, the loose correlation between images and texts of web …

被引用次数：16 相关文章所有 5 个版本

[PDF] arxiv.org

Filip: Fine-grained interactive language-image pre-training

L Yao, R Huang, L Hou, G Lu, M Niu, H Xu… - arXiv preprint arXiv …, 2021 - arxiv.org

Unsupervised large-scale vision-language pre-training has shown promising advances on
various downstream tasks. Existing methods often model the cross-modal interaction either …

被引用次数：471 相关文章所有 4 个版本

[PDF] thecvf.com

Learning customized visual models with retrieval-augmented knowledge

H Liu, K Son, J Yang, C Liu, J Gao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Image-text contrastive learning models such as CLIP have demonstrated strong task transfer
ability. The high generality and usability of these visual models is achieved via a web-scale …

被引用次数：15 相关文章所有 5 个版本

[PDF] thecvf.com

Finetune like you pretrain: Improved finetuning of zero-shot vision models

S Goyal, A Kumar, S Garg, Z Kolter… - Proceedings of the …, 2023 - openaccess.thecvf.com

Finetuning image-text models such as CLIP achieves state-of-the-art accuracies on a variety
of benchmarks. However, recent works (Kumar et al., 2022; Wortsman et al., 2021) have …

被引用次数：84 相关文章所有 5 个版本

[PDF] arxiv.org

Demystifying clip data

H Xu, S Xie, XE Tan, PY Huang, R Howes… - arXiv preprint arXiv …, 2023 - arxiv.org

Contrastive Language-Image Pre-training (CLIP) is an approach that has advanced
research and applications in computer vision, fueling modern recognition systems and …

被引用次数：63 相关文章所有 3 个版本

[PDF] arxiv.org

Chinese clip: Contrastive vision-language pretraining in chinese

A Yang, J Pan, J Lin, R Men, Y Zhang, J Zhou… - arXiv preprint arXiv …, 2022 - arxiv.org

The tremendous success of CLIP (Radford et al., 2021) has promoted the research and
application of contrastive learning for vision-language pretraining. In this work, we construct …

被引用次数：87 相关文章所有 2 个版本

[PDF] arxiv.org

Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm

Y Li, F Liang, L Zhao, Y Cui, W Ouyang, J Shao… - arXiv preprint arXiv …, 2021 - arxiv.org

Recently, large-scale Contrastive Language-Image Pre-training (CLIP) has attracted
unprecedented attention for its impressive zero-shot recognition ability and excellent …

被引用次数：379 相关文章所有 3 个版本

[PDF] neurips.cc

Laion-5b: An open large-scale dataset for training next generation image-text models

C Schuhmann, R Beaumont, R Vencu… - Advances in …, 2022 - proceedings.neurips.cc

Groundbreaking language-vision architectures like CLIP and DALL-E proved the utility of
training on large amounts of noisy image-text data, without relying on expensive accurate …

被引用次数：2026 相关文章所有 12 个版本

高级搜索

QQ 群

Reproducible scaling laws for contrastive language-image learning

Sigmoid loss for language image pre-training

Non-contrastive learning meets language-image pre-training

Filip: Fine-grained interactive language-image pre-training

Learning customized visual models with retrieval-augmented knowledge

Finetune like you pretrain: Improved finetuning of zero-shot vision models

Demystifying clip data

Chinese clip: Contrastive vision-language pretraining in chinese

Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm

Laion-5b: An open large-scale dataset for training next generation image-text models

相关搜索

引用