Eclipse: A resource-efficient text-to-image prior for image generations

M Patel, C Kim, S Cheng, C Baral… - … on Computer Vision …, 2024 - openaccess.thecvf.com
… Given these constraints, our goal is to develop an alternative prior learning methodology
that improves parameter efficiency (97% reduction) and mitigates the need for large-scale high…

Using poll sheets and computer vision as an inexpensive alternative to clickers

J Gain - … African Institute for Computer Scientists and Information …, 2013 - dl.acm.org
… In this paper we present an inexpensive alternative to Clickers. Poll sheets with coloured
blocks … This image is then processed using computer vision to count and classify the students’ …

Uniter: Universal image-text representation learning

YC Chen, L Li, L Yu, A El Kholy, F Ahmed… - … on computer vision, 2020 - Springer
… Self-supervised learning utilizes original data as its own source of supervision, which has
been applied to many Computer Vision tasks, such as image colorization [49], solving jigsaw …

Enriching AI-based Image descriptions for people who are vision-impaired

R Akut - 2023 - escholarship.mcgill.ca
… internet graphics due to lack of alternative text descriptions. Hence several companies have
… Hence in this thesis, we propose DICE, a Computer Vision (CV) based system that can …

Unified contrastive learning in image-text-label space

J Yang, C Li, P Zhang, B Xiao, C Liu… - … on Computer Vision …, 2022 - openaccess.thecvf.com
… We evaluate the quality of learned representations on a set of computer vision tasks, … architecture
for computer vision. In Proceedings of the IEEE conference on computer vision and …

Revisiting scene text recognition: A data perspective

Q Jiang, J Wang, D Peng, C Liu… - … on computer vision, 2023 - openaccess.thecvf.com
… This ICCV paper is the Open Access version, provided by the Computer Vision Foundation.
Except for this watermark, it is identical to the accepted version; the final published version of …

MAFA: Managing False Negatives for Vision-Language Pre-training

J Byun, D Kim, T Moon - … Conference on Computer Vision …, 2024 - openaccess.thecvf.com
… Following ALBEF, we adopt our image encoder as a 12layer Vision Transformer [12] with
86 million parameters, pre-trained on ImageNet-1k [57]. Both the text and multimodal encoders …

Tem-adapter: Adapting image-text pretraining for video question answer

G Chen, X Liu, G Wang, K Zhang… - … Computer Vision, 2023 - openaccess.thecvf.com
… This ICCV paper is the Open Access version, provided by the Computer Vision Foundation.
Except for … This motivates us to explore cheaper and lighter alternative pre-trained models. …

Laion-400m: Open dataset of clip-filtered 400 million image-text pairs

C Schuhmann, R Vencu, R Beaumont… - arXiv preprint arXiv …, 2021 - arxiv.org
… • We use CLIP to compute embeddings of the image and alt-text. Then we compute the
cosine similarity of both embeddings and drop all samples with cosine similarity below 0.3. This …

Imagebert: Cross-modal pre-training with large-scale weak-supervised image-text data

D Qi, L Su, J Song, E Cui, T Bharti, A Sacheti - arXiv preprint arXiv …, 2020 - arxiv.org
… (NLP) and computer vision (CV) communities. For example, Text-Image Retrieval[4] aims to
… 3M images with descriptions harvested from the Alt-text HTML attribute of the web pages, …