alt text computer vision- 学术资源搜索

Infusion: Inject and attention fusion for multi concept zero-shot text-based video editing

A Khandelwal - … Conference on Computer Vision, 2023 - openaccess.thecvf.com

… Our framework is a low-cost alternative to one-shot tuned models for editing since it does …
This ICCV workshop paper is the Open Access version, provided by the Computer Vision …

被引用次数：7 相关文章所有 5 个版本

[PDF] thecvf.com

Shifted diffusion for text-to-image generation

Y Zhou, B Liu, Y Zhu, X Yang… - … on computer vision …, 2023 - openaccess.thecvf.com

… , provided by the Computer Vision Foundation. … text to image generation with attentional
generative adversarial networks. In Proceedings of the IEEE conference on computer vision …

被引用次数：29 相关文章所有 6 个版本

[PDF] arxiv.org

Coca: Contrastive captioners are image-text foundation models

J Yu, Z Wang, V Vasudevan, L Yeung… - arXiv preprint arXiv …, 2022 - arxiv.org

… models is of significant interest in computer vision because these models can be quickly …
web-scale alt-text data and annotated images by treating all labels simply as text, seamlessly …

被引用次数：1033 相关文章所有 7 个版本

[PDF] stanford.edu

Let's Talk about X: Combining image recognition and eye gaze to support conversation for people with ALS

SK Kane, MR Morris - Proceedings of the 2017 Conference on …, 2017 - dl.acm.org

… based AAC system that uses computer vision to identify objects in … Alternative text entry
methods such as Dasher [24,25] may offer improved performance, but learning an alternative text …

被引用次数：17 相关文章所有 4 个版本

[图书][B] Computer vision: models, learning, and inference

SJD Prince - 2012 - books.google.com

… That book too is marked by an enormously comprehensive view of computer vision, by …
summary of the state of the art in computer vision, the frontier of our knowledge and abilities, …

被引用次数：1035 相关文章所有 9 个版本

[PDF] nowpublishers.com

Vision-language pre-training: Basics, recent advances, and future trends

Z Gan, L Li, C Li, L Wang, Z Liu… - … Trends® in Computer …, 2022 - nowpublishers.com

… text inputs. In Section 4, we discuss how core computer vision tasks can be viewed as image-text
… by contrastively pre-trained image-text models (such as CLIP [326]), and further enable …

被引用次数：142 相关文章所有 7 个版本

[PDF] thecvf.com

Multi-modal representation learning with text-driven soft masks

J Park, B Han - … IEEE/CVF Conference on Computer Vision …, 2023 - openaccess.thecvf.com

… believed to harm vision-language models [16, 39] by breaking the semantic consistency
of the image-text pair, but we argue that this does not hold for the large-scale vision-language …

被引用次数：4 相关文章所有 7 个版本

[PDF] thecvf.com

Ld-znet: A latent diffusion approach for text-based image segmentation

K Pnvr, B Singh, P Ghosh… - … on Computer Vision, 2023 - openaccess.thecvf.com

… have received less attention compared to alternative unsupervised methods [4], because of
… the internal features of a text-to-image LDM [38] for text based image segmentation, which is …

被引用次数：9 相关文章所有 7 个版本

[PDF] thecvf.com

Transform and tell: Entity-aware news image captioning

A Tran, A Mathews, L Xie - … conference on computer vision …, 2020 - openaccess.thecvf.com

We propose an end-to-end model which generates captions for images embedded in news
articles. News images present two key challenges: they rely on real-world knowledge, …

被引用次数：99 相关文章所有 7 个版本

[图书][B] Markov random field modeling in computer vision

SZ Li - 2012 - books.google.com

… in image processing and computer vision is to capture the … in image processing, computer
vision, applied statistics, and … of Markov random fields to computer vision problems such as …

被引用次数：2143 相关文章所有 6 个版本

高级搜索

QQ 群

Infusion: Inject and attention fusion for multi concept zero-shot text-based video editing

Shifted diffusion for text-to-image generation

Coca: Contrastive captioners are image-text foundation models

Let's Talk about X: Combining image recognition and eye gaze to support conversation for people with ALS

[图书][B] Computer vision: models, learning, and inference

Vision-language pre-training: Basics, recent advances, and future trends

Multi-modal representation learning with text-driven soft masks

Ld-znet: A latent diffusion approach for text-based image segmentation

Transform and tell: Entity-aware news image captioning

[图书][B] Markov random field modeling in computer vision

引用