Egovlpv2: Egocentric video-language pre-training with fusion in the backbone

S Pramanick, Y Song, S Nag, KQ Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com
Video-language pre-training (VLP) has become increasingly important due to its ability to
generalize to various vision and language tasks. However, existing egocentric VLP …

Improving fine-grained understanding in image-text pre-training

I Bica, A Ilić, M Bauer, G Erdogan, M Bošnjak… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce SPARse Fine-grained Contrastive Alignment (SPARC), a simple method for
pretraining more fine-grained multimodal representations from image-text pairs. Given that …

Cross‐modal knowledge learning with scene text for fine‐grained image classification

L Xiong, Y Mao, Z Wang, B Nie, C Li - IET Image Processing, 2024 - Wiley Online Library
Scene text in natural images carries additional semantic information to aid in image
classification. Existing methods lack full consideration of the deep understanding of the text …

Enhancing Defective Solar Panel Detection with Attention-Guided Statistical Features Using Pre-Trained Neural Networks

H Lee, YH Park, J Yi - … Conference on Big Data and Smart …, 2024 - ieeexplore.ieee.org
For defective solar panel detection, the use of resource-depleting methods such as end-to-
end deep learning models does not serve the purpose of sustainable green energy. A recent …