Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

X Li, X Yin, C Li, P Zhang, X Hu, L Zhang… - Computer Vision–ECCV …, 2020 - Springer
… modal representations on image-text pairs are becoming popular for vision-language tasks.
text features as input to the model to be pre-trained and use self-attention to learn image-text

Comparison of computer vision approaches in application to the electricity and gas meter reading

M Spichkova, J Van Zyl, S Sachdev, A Bhardwaj… - Evaluation of Novel …, 2020 - Springer
… convenient alternative method for their current meter reading updating system. The proposed
solution is to use computer vision techniques for capturing readings. One of the alternative

[PDF][PDF] Relational Learning in Computer Vision.

N Messina, F Falchi, G Amato, M Avvenuti, J Lokoc… - 2022 - researchgate.net
… This framework overturned many computer science fields, like Computer Vision and Natural
Language Processing, obtaining astonishing results. Nevertheless, many challenges are …

Groupvit: Semantic segmentation emerges from text supervision

J Xu, S De Mello, S Liu, W Byeon… - … Computer Vision …, 2022 - openaccess.thecvf.com
… Inspired by the success of Transformers in NLP [20, 76], the Vision Transformer (ViT) [22]
was recently proposed and has been successfully applied to multiple computer vision tasks, …

Training affective computer vision models by crowdsourcing soft-target labels

P Washington, H Kalantarian, J Kent, A Husic… - Cognitive …, 2021 - Springer
… Fundamental to a successful computer vision approach for affective computing is the feature
representation of the image, and there are several approaches to engineering such features…

Generative negative text replay for continual vision-language pretraining

S Yan, L Hong, H Xu, J Han, T Tuytelaars, Z Li… - … on Computer Vision, 2022 - Springer
… -text retrieval. It is a significant step towards flexible and practical zero-shot classifiers for
computer vision … In this work, we concentrate on the continual vision-language representation …

Image as a foreign language: Beit pretraining for vision and vision-language tasks

W Wang, H Bao, L Dong, J Bjorck… - … Computer Vision …, 2023 - openaccess.thecvf.com
… , provided by the Computer Vision Foundation. Except … Text Reconstruction We study the
effects of text reconstruction on monomodal and multimodal data. As shown in Table 8e, the text

Understanding blind people's experiences with computer-generated captions of social media images

H MacLeod, CL Bennett, MR Morris… - proceedings of the 2017 …, 2017 - dl.acm.org
… This requires not only advanced computer vision and deep learning techniques but also …
This account consisted of 14 tweets, each with an image and an alt text caption. The tweets in …

Distilling vision-language models on millions of videos

Y Zhao, L Zhao, X Zhou, J Wu… - … Computer Vision …, 2024 - openaccess.thecvf.com
… training the dual-encoder model on VideoCC with alttext. Specifically, training with only 1% …
∼7M), indicating that the original alttext scales poorly. We attribute the alt-text’s inferior perfor…

Make-a-scene: Scene-based text-to-image generation with human priors

O Gafni, A Polyak, O Ashual, S Sheynin… - … on Computer Vision, 2022 - Springer
… relationships between human gaze, description, and computer vision. In: Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, pp. 739–746 (2013) …