alt text computer vision- 学术资源搜索

Eclipse: A resource-efficient text-to-image prior for image generations

M Patel, C Kim, S Cheng, C Baral… - … on Computer Vision …, 2024 - openaccess.thecvf.com

… Given these constraints, our goal is to develop an alternative prior learning methodology
that improves parameter efficiency (97% reduction) and mitigates the need for large-scale high…

被引用次数：4 相关文章所有 3 个版本

[PDF] uct.ac.za

Using poll sheets and computer vision as an inexpensive alternative to clickers

J Gain - … African Institute for Computer Scientists and Information …, 2013 - dl.acm.org

… In this paper we present an inexpensive alternative to Clickers. Poll sheets with coloured
blocks … This image is then processed using computer vision to count and classify the students’ …

被引用次数：10 相关文章所有 5 个版本

[PDF] arxiv.org

Uniter: Universal image-text representation learning

YC Chen, L Li, L Yu, A El Kholy, F Ahmed… - … on computer vision, 2020 - Springer

… Self-supervised learning utilizes original data as its own source of supervision, which has
been applied to many Computer Vision tasks, such as image colorization [49], solving jigsaw …

被引用次数：1998 相关文章所有 7 个版本

[PDF] mcgill.ca

Enriching AI-based Image descriptions for people who are vision-impaired

R Akut - 2023 - escholarship.mcgill.ca

… internet graphics due to lack of alternative text descriptions. Hence several companies have
… Hence in this thesis, we propose DICE, a Computer Vision (CV) based system that can …

[PDF] thecvf.com

Unified contrastive learning in image-text-label space

J Yang, C Li, P Zhang, B Xiao, C Liu… - … on Computer Vision …, 2022 - openaccess.thecvf.com

… We evaluate the quality of learned representations on a set of computer vision tasks, … architecture
for computer vision. In Proceedings of the IEEE conference on computer vision and …

被引用次数：173 相关文章所有 5 个版本

[PDF] thecvf.com

Revisiting scene text recognition: A data perspective

Q Jiang, J Wang, D Peng, C Liu… - … on computer vision, 2023 - openaccess.thecvf.com

… This ICCV paper is the Open Access version, provided by the Computer Vision Foundation.
Except for this watermark, it is identical to the accepted version; the final published version of …

被引用次数：16 相关文章所有 5 个版本

[PDF] thecvf.com

MAFA: Managing False Negatives for Vision-Language Pre-training

J Byun, D Kim, T Moon - … Conference on Computer Vision …, 2024 - openaccess.thecvf.com

… Following ALBEF, we adopt our image encoder as a 12layer Vision Transformer [12] with
86 million parameters, pre-trained on ImageNet-1k [57]. Both the text and multimodal encoders …

[PDF] thecvf.com

Tem-adapter: Adapting image-text pretraining for video question answer

G Chen, X Liu, G Wang, K Zhang… - … Computer Vision, 2023 - openaccess.thecvf.com

… This ICCV paper is the Open Access version, provided by the Computer Vision Foundation.
Except for … This motivates us to explore cheaper and lighter alternative pre-trained models. …

被引用次数：11 相关文章所有 6 个版本

[PDF] arxiv.org

Laion-400m: Open dataset of clip-filtered 400 million image-text pairs

C Schuhmann, R Vencu, R Beaumont… - arXiv preprint arXiv …, 2021 - arxiv.org

… • We use CLIP to compute embeddings of the image and alt-text. Then we compute the
cosine similarity of both embeddings and drop all samples with cosine similarity below 0.3. This …

被引用次数：941 相关文章所有 7 个版本

[PDF] arxiv.org

Imagebert: Cross-modal pre-training with large-scale weak-supervised image-text data

D Qi, L Su, J Song, E Cui, T Bharti, A Sacheti - arXiv preprint arXiv …, 2020 - arxiv.org

… (NLP) and computer vision (CV) communities. For example, Text-Image Retrieval[4] aims to
… 3M images with descriptions harvested from the Alt-text HTML attribute of the web pages, …

被引用次数：264 相关文章所有 2 个版本

高级搜索

QQ 群

Eclipse: A resource-efficient text-to-image prior for image generations

Using poll sheets and computer vision as an inexpensive alternative to clickers

Uniter: Universal image-text representation learning

Enriching AI-based Image descriptions for people who are vision-impaired

Unified contrastive learning in image-text-label space

Revisiting scene text recognition: A data perspective

MAFA: Managing False Negatives for Vision-Language Pre-training

Tem-adapter: Adapting image-text pretraining for video question answer

Laion-400m: Open dataset of clip-filtered 400 million image-text pairs

Imagebert: Cross-modal pre-training with large-scale weak-supervised image-text data

引用