alt text computer vision- 学术资源搜索

Computer vision-enhanced selection of geo-tagged photos on social network sites for land cover classification

MM ElQadi, M Lesiv, AG Dyer, A Dorin - Environmental Modelling & …, 2020 - Elsevier

… Our framework uses computer vision to analyse the content of geo-tagged photos on social
network sites to generate descriptive tags. These are used to train artificial neural networks to …

被引用次数：12 相关文章所有 12 个版本

[PDF] thecvf.com

Position-guided text prompt for vision-language pre-training

J Wang, P Zhou, MZ Shou… - … on Computer Vision and …, 2023 - openaccess.thecvf.com

Vision-Language Pre-Training (VLP) has shown promising capabilities to align image and
text pairs, facilitating a broad variety of cross-modal learning tasks. However, we observe that …

被引用次数：25 相关文章所有 4 个版本

[PDF] osf.io

ViSpa (Vision Spaces): A computer-vision-based representation system for individual images and concept prototypes, with large-scale evaluation.

F Günther, M Marelli, S Tureski, MA Petilli - Psychological Review, 2023 - psycnet.apa.org

… In order to model vision-based mental … , vision-based representations obtained from
computer-vision DCNNs have received considerably less attention as compared to computer science…

被引用次数：22 相关文章所有 13 个版本

[PDF] thecvf.com

Learning deep structure-preserving image-text embeddings

L Wang, Y Li, S Lazebnik - … conference on computer vision …, 2016 - openaccess.thecvf.com

This paper proposes a method for learning joint embeddings of images and text using a two-branch
neural network with multiple layers of linear projections followed by nonlinearities. …

被引用次数：934 相关文章所有 13 个版本

[PDF] iiit.ac.in

Classroom slide narration system

KV Jobin, A Mondal, CV Jawahar - … Conference on Computer Vision and …, 2021 - Springer

… csns and the existing assistive systems—Automatic Alt-Text (aat) [30] and Tesseract ocr [26].
The proposed system generates a markup text, where each logical content is tagged with its …

被引用次数：2 相关文章所有 7 个版本

[图书][B] Digital image processing and analysis: human and computer vision applications with CVIPtools

SE Umbaugh - 2010 - books.google.com

… Divided into five major sections, this book provides the concepts and models required to
analyze digital images and develop computer vision and human consumption applications as …

被引用次数：460 相关文章所有 6 个版本

Medical Prescription Label Reading Using Computer Vision and Deep Learning

A Henry, R Sujee - Soft Computing for Problem Solving: Proceedings of …, 2023 - Springer

… reduced dimensions and then fed into two alternative architectures, CRNN alone and EAST
… to conventional text using these models. After obtaining the texts, the text is calculated using …

Learning transferable visual models from natural language supervision

A Radford, JW Kim, C Hallacy… - International …, 2021 - proceedings.mlr.press

… SOTA computer vision systems are trained to predict a fixed … from raw text about images
is a promising alternative which … We study performance on over 30 different computer vision …

被引用次数：20617 相关文章所有 20 个版本

[PDF] arxiv.org

Learning audio-video modalities from image captions

A Nagrani, PH Seo, B Seybold, A Hauth… - … on Computer Vision, 2022 - Springer

… -text datasets, as images with alt-text captions can be easily obtained online. Obtaining
large-scale, high quality data for video in the form of text-video and text-… are suitable for text-audio …

被引用次数：78 相关文章所有 8 个版本

[PDF] arxiv.org

StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation

A Maharana, D Hannan, M Bansal - … Conference on Computer Vision, 2022 - Springer

Recent advances in text-to-image synthesis have led to large pretrained transformers with
excellent capabilities to generate visualizations from a given text. However, these models are …

被引用次数：44 相关文章所有 5 个版本

高级搜索

QQ 群