Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv: 150400325

S Li, J van de Weijer, T Hu, FS Khan, Q Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

A significant research effort is focused on exploiting the amazing capacities of pretrained
diffusion models for the editing of images. They either finetune the model, or invert the image …

被引用次数：41 相关文章所有 2 个版本

[PDF] ucl.ac.uk

TSingNet: Scale-aware and context-rich feature learning for traffic sign detection and recognition in the wild

Y Liu, J Peng, JH Xue, Y Chen, ZH Fu - Neurocomputing, 2021 - Elsevier

Traffic sign detection and recognition in the wild is a challenging task. Existing techniques
are often incapable of detecting small or occluded traffic signs because of the scale variation …

被引用次数：61 相关文章所有 3 个版本

[PDF] springer.com

Language with vision: A study on grounded word and sentence embeddings

H Shahmohammadi, M Heitmeier… - Behavior Research …, 2024 - Springer

Grounding language in vision is an active field of research seeking to construct cognitively
plausible word and sentence representations by incorporating perceptual knowledge from …

被引用次数：9 相关文章所有 13 个版本

[PDF] science.org Full View

A Precise Framework for Rice Leaf Disease Image–Text Retrieval Using FHTW-Net

H Zhou, Y Hu, S Liu, G Zhou, J Xu, A Chen… - Plant …, 2024 - spj.science.org

Cross-modal retrieval for rice leaf diseases is crucial for prevention, providing agricultural
experts with data-driven decision support to address disease threats and safeguard rice …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

Diverse and styled image captioning using singular value decomposition‐based mixture of recurrent experts

M Heidari, M Ghatee, A Nickabadi… - Concurrency and …, 2022 - Wiley Online Library

With significant advances in vision and natural language processing, the generation of
image captions becomes a need. Mathews, Xie, and He extended a new model to generate …

被引用次数：4 相关文章所有 5 个版本

高级搜索

QQ 群