Thinking fast and slow: Efficient text-to-visual retrieval with transformers

A Miech, JB Alayrac, I Laptev, J Sivic… - … on Computer Vision …, 2021 - openaccess.thecvf.com
Our objective is language-based search of large-scale image and video datasets. For this
task, the approach that consists of independently mapping text and vision to a joint embedding …

SciOL and MuLMS-Img: Introducing A Large-Scale Multimodal Scientific Dataset and Models for Image-Text Tasks in the Scientific Domain

T Tarsi, H Adel, JH Metzen, D Zhang… - … of Computer Vision, 2024 - openaccess.thecvf.com
… to the HCI alt-text dataset [8] for our evaluation. It consists of 3386 scientific figures with alt-text
descriptions extracted from publications on HumanComputer Interaction and accessibility. …

Locvtp: Video-text pre-training for temporal localization

M Cao, T Yang, J Weng, C Zhang, J Wang… - … on Computer Vision, 2022 - Springer
… 3) The recent work CLIP [43] provides a stronger vision encoder and we also evaluate the
performance based on it. It is shown that the CLIP’s weights greatly improve the performance …

Improving accessibility of the web with a computer game

L Von Ahn, S Ginosar, M Kedia, R Liu… - Proceedings of the …, 2006 - dl.acm.org
… In essence, we solve a typical computer vision problem with … the images on the Web have
an HTML ALT caption). Today, it is the … Rather than designing a computer vision algorithm that …

WebInSight: making web images accessible

JP Bigham, RS Kaminsky, RE Ladner… - Proceedings of the 8th …, 2006 - dl.acm.org
alternative text. To ameliorate this problem, we introduce WebInSight, a sysн tem that
automatically creates and inserts alternative text … image using computer vision techniques is …

Unleash the Potential of Upstream Data Using Search, AI and Computer Vision

HM Asfoor, DA Alharbi - Abu Dhabi International Petroleum Exhibition …, 2022 - onepetro.org
… Enterprise Search, AI and Computer Vision to construct a single … Fourth, applying Computer
Vision techniques to extract … Figure 7, that utilizes Computer Vision to detect and extract …

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

X Li, X Yin, C Li, P Zhang, X Hu, L Zhang… - Computer Vision–ECCV …, 2020 - Springer
… modal representations on image-text pairs are becoming popular for vision-language tasks.
text features as input to the model to be pre-trained and use self-attention to learn image-text

Comparison of computer vision approaches in application to the electricity and gas meter reading

M Spichkova, J Van Zyl, S Sachdev, A Bhardwaj… - Evaluation of Novel …, 2020 - Springer
… convenient alternative method for their current meter reading updating system. The proposed
solution is to use computer vision techniques for capturing readings. One of the alternative

[PDF][PDF] Relational Learning in Computer Vision.

N Messina, F Falchi, G Amato, M Avvenuti, J Lokoc… - 2022 - researchgate.net
… This framework overturned many computer science fields, like Computer Vision and Natural
Language Processing, obtaining astonishing results. Nevertheless, many challenges are …

Groupvit: Semantic segmentation emerges from text supervision

J Xu, S De Mello, S Liu, W Byeon… - … Computer Vision …, 2022 - openaccess.thecvf.com
… Inspired by the success of Transformers in NLP [20, 76], the Vision Transformer (ViT) [22]
was recently proposed and has been successfully applied to multiple computer vision tasks, …