相关文章- 学术资源搜索

Improving clip training with language rewrites

L Fan, D Krishnan, P Isola… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Contrastive Language-Image Pre-training (CLIP) stands as one of the most effective
and scalable methods for training transferable vision models using paired image and text …

被引用次数：68 相关文章所有 6 个版本

[PDF] neurips.cc

An inverse scaling law for clip training

X Li, Z Wang, C Xie - Advances in Neural Information …, 2024 - proceedings.neurips.cc

CLIP, one of the pioneering foundation models that connect images and text, has enabled
many recent breakthroughs in computer vision. However, its associated training cost is …

被引用次数：25 相关文章所有 5 个版本

[PDF] neurips.cc

A closer look at the robustness of contrastive language-image pre-training (clip)

W Tu, W Deng, T Gedeon - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Abstract Contrastive Language-Image Pre-training (CLIP) models have demonstrated
remarkable generalization capabilities across multiple challenging distribution shifts …

被引用次数：12 相关文章所有 7 个版本

[PDF] aaai.org

Softclip: Softer cross-modal alignment makes clip stronger

Y Gao, J Liu, Z Xu, T Wu, E Zhang, K Li… - Proceedings of the …, 2024 - ojs.aaai.org

During the preceding biennium, vision-language pre-training has achieved noteworthy
success on several downstream tasks. Nevertheless, acquiring high-quality image-text pairs …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Democratizing contrastive language-image pre-training: A clip benchmark of data, model, and supervision

Y Cui, L Zhao, F Liang, Y Li, J Shao - arXiv preprint arXiv:2203.05796, 2022 - arxiv.org

Contrastive Language-Image Pretraining (CLIP) has emerged as a novel paradigm to learn
visual models from language supervision. While researchers continue to push the frontier of …

被引用次数：34 相关文章所有 3 个版本

[PDF] arxiv.org

Unsupervised prompt learning for vision-language models

T Huang, J Chu, F Wei - arXiv preprint arXiv:2204.03649, 2022 - arxiv.org

Contrastive vision-language models like CLIP have shown great progress in transfer
learning. In the inference stage, the proper text description, also known as prompt, needs to …

被引用次数：116 相关文章所有 2 个版本

[PDF] thecvf.com

Attentive mask clip

Y Yang, W Huang, Y Wei, H Peng… - Proceedings of the …, 2023 - openaccess.thecvf.com

In vision-language modeling, image token removal is an efficient augmentation technique to
reduce the cost of encoding image features. The CLIP-style models, however, have been …

被引用次数：14 相关文章所有 5 个版本

[PDF] thecvf.com

Alpha-clip: A clip model focusing on wherever you want

Z Sun, Y Fang, T Wu, P Zhang, Y Zang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Contrastive Language-Image Pre-training (CLIP) plays an essential role in
extracting valuable content information from images across diverse tasks. It aligns textual …

被引用次数：16 相关文章所有 3 个版本

[PDF] thecvf.com

Scaling language-image pre-training via masking

Y Li, H Fan, R Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We present Fast Language-Image Pre-training (FLIP), a simple and more efficient
method for training CLIP. Our method randomly masks out and removes a large portion of …

被引用次数：204 相关文章所有 6 个版本

[PDF] arxiv.org

Eva-clip: Improved training techniques for clip at scale

Q Sun, Y Fang, L Wu, X Wang, Y Cao - arXiv preprint arXiv:2303.15389, 2023 - arxiv.org

Contrastive language-image pre-training, CLIP for short, has gained increasing attention for
its potential in various scenarios. In this paper, we propose EVA-CLIP, a series of models …

被引用次数：200 相关文章所有 2 个版本

高级搜索

QQ 群

Improving clip training with language rewrites

An inverse scaling law for clip training

A closer look at the robustness of contrastive language-image pre-training (clip)

Softclip: Softer cross-modal alignment makes clip stronger

Democratizing contrastive language-image pre-training: A clip benchmark of data, model, and supervision

Unsupervised prompt learning for vision-language models

Attentive mask clip

Alpha-clip: A clip model focusing on wherever you want

Scaling language-image pre-training via masking

Eva-clip: Improved training techniques for clip at scale

相关搜索

引用