Learning text-image joint embedding for efficient cross-modal retrieval with deep feature...

M Shukor, G Couairon, A Grechka… - Proceedings of the …, 2022 - openaccess.thecvf.com

Cross-modal image-recipe retrieval has gained significant attention in recent years. Most
work focuses on improving cross-modal embeddings using unimodal encoders, that allow …

被引用次数：17 相关文章所有 7 个版本

[PDF] archive.org

Improving Cross-Modal Recipe Retrieval with Component-Aware Prompted CLIP Embedding

X Huang, J Liu, Z Zhang, Y Xie - Proceedings of the 31st ACM …, 2023 - dl.acm.org

Cross-modal recipe retrieval is an emerging visual-textual retrieval task, which aims at
matching food images with the corresponding recipes. Although large-scale Vision …

被引用次数：3 相关文章所有 2 个版本

[PDF] thecvf.com

Fine-Grained Alignment for Cross-Modal Recipe Retrieval

M Wahed, X Zhou, T Yu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Vision-language pre-trained models have exhibited significant advancements in various
multimodal and unimodal tasks in recent years, including cross-modal recipe retrieval …

被引用次数：2 相关文章所有 5 个版本

[PDF] ieee.org

CREAMY: Cross-Modal Recipe Retrieval By Avoiding Matching Imperfectly

Z Zou, X Zhu, Q Zhu, Y Liu, L Zhu - IEEE Access, 2024 - ieeexplore.ieee.org

State-of-the-art methods for cross-modal recipe retrieval failed to consider an underlying but
challenging issue, ie, matching imperfectly problem hidden in positive image-recipe pairs …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Vision and structured-language pretraining for cross-modal food retrieval

M Shukor, N Thome, M Cord - Computer Vision and Image Understanding, 2024 - Elsevier

Abstract Vision-Language Pretraining (VLP) and Foundation models have been the go-to
recipe for achieving SoTA performance on general benchmarks. However, leveraging these …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Overview of memotion 3: Sentiment and emotion analysis of codemixed hinglish memes

S Mishra, S Suryavardan, M Chakraborty… - arXiv preprint arXiv …, 2023 - arxiv.org

Analyzing memes on the internet has emerged as a crucial endeavor due to the impact this
multi-modal form of content wields in shaping online discourse. Memes have become a …

被引用次数：2 相关文章所有 6 个版本

Cross-modal Recipe Retrieval with Fine-grained Prompting Alignment and Evidential Semantic Consistency

X Huang, J Liu, Z Zhang, Y Xie, Y Tang… - IEEE Transactions …, 2024 - ieeexplore.ieee.org

Alignment between the food images and the corresponding recipes is an emerging cross-
modal representation learning task. In this task, the recipes are composed of three …

[PDF] mdpi.com

Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval

Z Zou, X Zhu, Q Zhu, H Zhang, L Zhu - Foods, 2024 - mdpi.com

As a prominent topic in food computing, cross-modal recipe retrieval has garnered
substantial attention. However, the semantic alignment across food images and recipes …

跨模态检索研究综述.

侯腾达，金冉，王晏祎，蒋义凯 - Journal of Computer …, 2022 - search.ebscohost.com

近年来, 各种类型的媒体数据, 如音频, 文本, 图像和视频, 在互联网上呈现爆发式增长,
不同类型的数据通常用于描述同一事件或主题. 跨模态检索提供了一些有效的方法 …

被引用次数：1 相关文章

Cross-Modal Interaction via Reinforcement Feedback for Audio-Lyrics Retrieval

D Zhou, F Lei, L Li, Y Zhou… - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org

The task of retrieving audio content relevant to lyric queries and vice versa plays a critical
role in music-oriented applications. In this process, robust feature representations have to be …

高级搜索

QQ 群