alt text computer vision- 学术资源搜索

Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts

S Changpinyo, P Sharma, N Ding… - … on computer vision …, 2021 - openaccess.thecvf.com

… To arrive at CC12M, we keep the image-text filtering intact, and relax the unimodal filters …
Second, in text-based filtering, we allow text between 3 and 256 words in the alt-text. We still …

被引用次数：802 相关文章所有 9 个版本

[PDF] academia.edu

[图书][B] Computer vision: algorithms and applications

R Szeliski - 2022 - books.google.com

… Yvan Leclerc and Pascal Fua, colleagues from my brief interlude at SRI International, gave
me new perspectives on alternative approaches to computer vision. During my six years of …

被引用次数：9481 相关文章所有 31 个版本

[PDF] mlr.press

Scaling up visual and vision-language representation learning with noisy text supervision

C Jia, Y Yang, Y Xia, YT Chen… - International …, 2021 - proceedings.mlr.press

… In this work, we leverage a dataset of over one billion noisy image alt-text pairs to scale
visual and vision-language representation learning. We follow the procedures described in the …

被引用次数：2925 相关文章所有 6 个版本

[PDF] thecvf.com

Lit: Zero-shot transfer with locked-image text tuning

X Zhai, X Wang, B Mustafa, A Steiner… - … on computer vision …, 2022 - openaccess.thecvf.com

… We collect 4 billion image and alt-text pairs following the same process as ALIGN [30],
with the same image-based filtering but simpler text-based filtering. Appendix L shows that …

被引用次数：458 相关文章所有 7 个版本

[PDF] thecvf.com

Scaling up vision-language pre-training for image captioning

X Hu, Z Gan, J Wang, Z Yang, Z Liu… - … on computer vision …, 2022 - openaccess.thecvf.com

… We remove the alt-text if any of its unigrams cannot be found in the vocabulary. Afterwards,
… 200 million images, each corresponding to one alt-text. The word cloud of 200 most frequent …

被引用次数：242 相关文章所有 5 个版本

[PDF] researchgate.net

Scene text detection and recognition: The deep learning era

S Long, X He, C Yao - International Journal of Computer Vision, 2021 - Springer

… With the rise and development of deep learning, computer vision has been tremendously
transformed and reshaped. As an important research area in computer vision, scene text …

被引用次数：479 相关文章所有 8 个版本

[PDF] thecvf.com

Adversarial representation learning for text-to-image matching

N Sarafianos, X Xu… - … on computer vision, 2019 - openaccess.thecvf.com

… For many computer vision applications such as image captioning, … and text level is an
essential yet challenging problem. Its challenges originate from the large word variance in the text …

被引用次数：222 相关文章所有 8 个版本

Text2live: Text-driven layered image and video editing

O Bar-Tal, D Ofri-Amar, R Fridman, Y Kasten… - … on computer vision, 2022 - Springer

We present a method for zero-shot, text-driven editing of natural images and videos. Given
an image or a video and a text prompt, our goal is to edit the appearance of existing objects (…

被引用次数：254 相关文章所有 4 个版本

[PDF] arxiv.org

Improving vision-and-language navigation with image-text pairs from the web

A Majumdar, A Shrivastava, S Lee, P Anderson… - Computer Vision–ECCV …, 2020 - Springer

… As an alternative, we propose learning visual grounding from freely-available internet data,
… alt-text captured in the Conceptual Captions dataset [24], containing around 3.3M image-text …

被引用次数：217 相关文章所有 9 个版本

Florence: A new foundation model for computer vision

L Yuan, D Chen, YL Chen, N Codella, X Dai… - arXiv preprint arXiv …, 2021 - arxiv.org

… shared representation, we introduce a new computer vision foundation model, Florence, to
… image-text data, our Florence model can be easily adapted for various computer vision tasks…

被引用次数：766 相关文章所有 2 个版本

高级搜索

QQ 群

Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts

[图书][B] Computer vision: algorithms and applications

Scaling up visual and vision-language representation learning with noisy text supervision

Lit: Zero-shot transfer with locked-image text tuning

Scaling up vision-language pre-training for image captioning

Scene text detection and recognition: The deep learning era

Adversarial representation learning for text-to-image matching

Text2live: Text-driven layered image and video editing

Improving vision-and-language navigation with image-text pairs from the web

Florence: A new foundation model for computer vision

相关搜索

引用