相关文章- 学术资源搜索

MAFA: Managing False Negatives for Vision-Language Pre-training

J Byun, D Kim, T Moon - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

We consider a critical issue of false negatives in Vision-Language Pre-training (VLP) a
challenge that arises from the inherent many-to-many correspondence of image-text pairs in …

Converting and Smoothing False Negatives for Vision-Language Pre-training

J Byun, D Kim, T Moon - arXiv preprint arXiv:2312.06112, 2023 - arxiv.org

We consider the critical issue of false negatives in Vision-Language Pre-training (VLP), a
challenge that arises from the inherent many-to-many correspondence of image-text pairs in …

被引用次数：1 相关文章所有 2 个版本

[PDF] thecvf.com

Leveraging per image-token consistency for vision-language pre-training

Y Gou, T Ko, H Yang, J Kwok… - Proceedings of the …, 2023 - openaccess.thecvf.com

Most existing vision-language pre-training (VLP) approaches adopt cross-modal masked
language modeling (CMLM) to learn vision-language associations. However, we find that …

被引用次数：4 相关文章所有 5 个版本

[PDF] thecvf.com

Accelerating vision-language pretraining with free language modeling

T Wang, Y Ge, F Zheng, R Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com

The state of the arts in vision-language pretraining (VLP) achieves exemplary performance
but suffers from high training costs resulting from slow convergence and long training time …

被引用次数：7 相关文章所有 5 个版本

[PDF] neurips.cc

Pyramidclip: Hierarchical feature alignment for vision-language model pretraining

Y Gao, J Liu, Z Xu, J Zhang, K Li… - Advances in neural …, 2022 - proceedings.neurips.cc

Large-scale vision-language pre-training has achieved promising results on downstream
tasks. Existing methods highly rely on the assumption that the image-text pairs crawled from …

被引用次数：67 相关文章所有 5 个版本

[PDF] thecvf.com

Filtering, distillation, and hard negatives for vision-language pre-training

F Radenovic, A Dubey, A Kadian… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision-language models trained with contrastive learning on large-scale noisy data are
becoming increasingly popular for zero-shot recognition problems. In this paper we improve …

被引用次数：62 相关文章所有 9 个版本

[PDF] thecvf.com

SaCo Loss: Sample-wise Affinity Consistency for Vision-Language Pre-training

S Wu, H Tan, Z Tian, Y Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com

Vision-language pre-training (VLP) aims to learn joint representations of vision and
language modalities. The contrastive paradigm is currently dominant in this field. However …

[PDF] aaai.org

TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training

C Jiang, W Ye, H Xu, Q Ye, M Yan, J Zhang… - Proceedings of the …, 2024 - ojs.aaai.org

Abstract Self-supervised Multi-modal Contrastive Learning (SMCL) remarkably advances
modern Vision-Language Pre-training (VLP) models by aligning visual and linguistic …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

VLMAE: Vision-language masked autoencoder

S He, T Guo, T Dai, R Qiao, C Wu, X Shu… - arXiv preprint arXiv …, 2022 - arxiv.org

Image and language modeling is of crucial importance for vision-language pre-training
(VLP), which aims to learn multi-modal representations from large-scale paired image-text …

被引用次数：12 相关文章所有 2 个版本

Cmal: A novel cross-modal associative learning framework for vision-language pre-training

Z Ma, J Li, G Li, K Huang - Proceedings of the 30th ACM International …, 2022 - dl.acm.org

With the flourishing of social media platforms, vision-language pre-training (VLP) recently
has received great attention and many remarkable progresses have been achieved. The …

被引用次数：5 相关文章

高级搜索

QQ 群

MAFA: Managing False Negatives for Vision-Language Pre-training

Converting and Smoothing False Negatives for Vision-Language Pre-training

Leveraging per image-token consistency for vision-language pre-training

Accelerating vision-language pretraining with free language modeling

Pyramidclip: Hierarchical feature alignment for vision-language model pretraining

Filtering, distillation, and hard negatives for vision-language pre-training

SaCo Loss: Sample-wise Affinity Consistency for Vision-Language Pre-training

TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training

VLMAE: Vision-language masked autoencoder

Cmal: A novel cross-modal associative learning framework for vision-language pre-training

相关搜索

引用