alt text computer vision- 学术资源搜索

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

X Li, X Yin, C Li, P Zhang, X Hu, L Zhang… - Computer Vision–ECCV …, 2020 - Springer

… modal representations on image-text pairs are becoming popular for vision-language tasks.
… text features as input to the model to be pre-trained and use self-attention to learn image-text …

被引用次数：1913 相关文章所有 6 个版本

Comparison of computer vision approaches in application to the electricity and gas meter reading

M Spichkova, J Van Zyl, S Sachdev, A Bhardwaj… - Evaluation of Novel …, 2020 - Springer

… convenient alternative method for their current meter reading updating system. The proposed
solution is to use computer vision techniques for capturing readings. One of the alternative …

被引用次数：15 相关文章所有 4 个版本

[PDF] researchgate.net

[PDF][PDF] Relational Learning in Computer Vision.

N Messina, F Falchi, G Amato, M Avvenuti, J Lokoc… - 2022 - researchgate.net

… This framework overturned many computer science fields, like Computer Vision and Natural
Language Processing, obtaining astonishing results. Nevertheless, many challenges are …

被引用次数：1 相关文章所有 2 个版本

[PDF] thecvf.com

Groupvit: Semantic segmentation emerges from text supervision

J Xu, S De Mello, S Liu, W Byeon… - … Computer Vision …, 2022 - openaccess.thecvf.com

… Inspired by the success of Transformers in NLP [20, 76], the Vision Transformer (ViT) [22]
was recently proposed and has been successfully applied to multiple computer vision tasks, …

被引用次数：399 相关文章所有 6 个版本

[HTML] nih.gov

Training affective computer vision models by crowdsourcing soft-target labels

P Washington, H Kalantarian, J Kent, A Husic… - Cognitive …, 2021 - Springer

… Fundamental to a successful computer vision approach for affective computing is the feature
representation of the image, and there are several approaches to engineering such features…

被引用次数：18 相关文章所有 9 个版本

[PDF] arxiv.org

Generative negative text replay for continual vision-language pretraining

S Yan, L Hong, H Xu, J Han, T Tuytelaars, Z Li… - … on Computer Vision, 2022 - Springer

… -text retrieval. It is a significant step towards flexible and practical zero-shot classifiers for
computer vision … In this work, we concentrate on the continual vision-language representation …

被引用次数：14 相关文章所有 7 个版本

[PDF] thecvf.com

Image as a foreign language: Beit pretraining for vision and vision-language tasks

W Wang, H Bao, L Dong, J Bjorck… - … Computer Vision …, 2023 - openaccess.thecvf.com

… , provided by the Computer Vision Foundation. Except … Text Reconstruction We study the
effects of text reconstruction on monomodal and multimodal data. As shown in Table 8e, the text …

被引用次数：369 相关文章所有 5 个版本

[PDF] stanford.edu

Understanding blind people's experiences with computer-generated captions of social media images

H MacLeod, CL Bennett, MR Morris… - proceedings of the 2017 …, 2017 - dl.acm.org

… This requires not only advanced computer vision and deep learning techniques but also …
This account consisted of 14 tweets, each with an image and an alt text caption. The tweets in …

被引用次数：199 相关文章所有 10 个版本

[PDF] thecvf.com

Distilling vision-language models on millions of videos

Y Zhao, L Zhao, X Zhou, J Wu… - … Computer Vision …, 2024 - openaccess.thecvf.com

… training the dual-encoder model on VideoCC with alttext. Specifically, training with only 1% …
∼7M), indicating that the original alttext scales poorly. We attribute the alt-text’s inferior perfor…

被引用次数：6 相关文章所有 4 个版本

Make-a-scene: Scene-based text-to-image generation with human priors

O Gafni, A Polyak, O Ashual, S Sheynin… - … on Computer Vision, 2022 - Springer

… relationships between human gaze, description, and computer vision. In: Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, pp. 739–746 (2013) …

被引用次数：389 相关文章所有 4 个版本

高级搜索

QQ 群

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

Comparison of computer vision approaches in application to the electricity and gas meter reading

[PDF][PDF] Relational Learning in Computer Vision.

Groupvit: Semantic segmentation emerges from text supervision

Training affective computer vision models by crowdsourcing soft-target labels

Generative negative text replay for continual vision-language pretraining

Image as a foreign language: Beit pretraining for vision and vision-language tasks

Understanding blind people's experiences with computer-generated captions of social media images

Distilling vision-language models on millions of videos

Make-a-scene: Scene-based text-to-image generation with human priors

引用