Multi-modal representation learning with text-driven soft masks

文章

学术资源搜索

获得 4 条结果（用时0.31秒）

我的图书馆

Multi-modal representation learning with text-driven soft masks

在引用文章中搜索

[PDF] thecvf.com

Egovlpv2: Egocentric video-language pre-training with fusion in the backbone

S Pramanick, Y Song, S Nag, KQ Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com

Video-language pre-training (VLP) has become increasingly important due to its ability to
generalize to various vision and language tasks. However, existing egocentric VLP …

被引用次数：30 相关文章所有 6 个版本

[PDF] arxiv.org

Improving fine-grained understanding in image-text pre-training

I Bica, A Ilić, M Bauer, G Erdogan, M Bošnjak… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce SPARse Fine-grained Contrastive Alignment (SPARC), a simple method for
pretraining more fine-grained multimodal representations from image-text pairs. Given that …

Cross‐modal knowledge learning with scene text for fine‐grained image classification

L Xiong, Y Mao, Z Wang, B Nie, C Li - IET Image Processing, 2024 - Wiley Online Library

Scene text in natural images carries additional semantic information to aid in image
classification. Existing methods lack full consideration of the deep understanding of the text …

Enhancing Defective Solar Panel Detection with Attention-Guided Statistical Features Using Pre-Trained Neural Networks

H Lee, YH Park, J Yi - … Conference on Big Data and Smart …, 2024 - ieeexplore.ieee.org

For defective solar panel detection, the use of resource-depleting methods such as end-to-
end deep learning models does not serve the purpose of sustainable green energy. A recent …

高级搜索

QQ 群

Multi-modal representation learning with text-driven soft masks

Egovlpv2: Egocentric video-language pre-training with fusion in the backbone

Improving fine-grained understanding in image-text pre-training

Cross‐modal knowledge learning with scene text for fine‐grained image classification

Enhancing Defective Solar Panel Detection with Attention-Guided Statistical Features Using Pre-Trained Neural Networks

引用