Transformer decoders with multimodal regularization for cross-modal food retrieval

M Shukor, G Couairon, A Grechka… - Proceedings of the …, 2022 - openaccess.thecvf.com
Cross-modal image-recipe retrieval has gained significant attention in recent years. Most
work focuses on improving cross-modal embeddings using unimodal encoders, that allow …

Improving Cross-Modal Recipe Retrieval with Component-Aware Prompted CLIP Embedding

X Huang, J Liu, Z Zhang, Y Xie - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Cross-modal recipe retrieval is an emerging visual-textual retrieval task, which aims at
matching food images with the corresponding recipes. Although large-scale Vision …

Fine-Grained Alignment for Cross-Modal Recipe Retrieval

M Wahed, X Zhou, T Yu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Vision-language pre-trained models have exhibited significant advancements in various
multimodal and unimodal tasks in recent years, including cross-modal recipe retrieval …

CREAMY: Cross-Modal Recipe Retrieval By Avoiding Matching Imperfectly

Z Zou, X Zhu, Q Zhu, Y Liu, L Zhu - IEEE Access, 2024 - ieeexplore.ieee.org
State-of-the-art methods for cross-modal recipe retrieval failed to consider an underlying but
challenging issue, ie, matching imperfectly problem hidden in positive image-recipe pairs …

Vision and structured-language pretraining for cross-modal food retrieval

M Shukor, N Thome, M Cord - Computer Vision and Image Understanding, 2024 - Elsevier
Abstract Vision-Language Pretraining (VLP) and Foundation models have been the go-to
recipe for achieving SoTA performance on general benchmarks. However, leveraging these …

Overview of memotion 3: Sentiment and emotion analysis of codemixed hinglish memes

S Mishra, S Suryavardan, M Chakraborty… - arXiv preprint arXiv …, 2023 - arxiv.org
Analyzing memes on the internet has emerged as a crucial endeavor due to the impact this
multi-modal form of content wields in shaping online discourse. Memes have become a …

Cross-modal Recipe Retrieval with Fine-grained Prompting Alignment and Evidential Semantic Consistency

X Huang, J Liu, Z Zhang, Y Xie, Y Tang… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Alignment between the food images and the corresponding recipes is an emerging cross-
modal representation learning task. In this task, the recipes are composed of three …

Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval

Z Zou, X Zhu, Q Zhu, H Zhang, L Zhu - Foods, 2024 - mdpi.com
As a prominent topic in food computing, cross-modal recipe retrieval has garnered
substantial attention. However, the semantic alignment across food images and recipes …

跨模态检索研究综述.

侯腾达, 金冉, 王晏祎, 蒋义凯 - Journal of Computer …, 2022 - search.ebscohost.com
近年来, 各种类型的媒体数据, 如音频, 文本, 图像和视频, 在互联网上呈现爆发式增长,
不同类型的数据通常用于描述同一事件或主题. 跨模态检索提供了一些有效的方法 …

Cross-Modal Interaction via Reinforcement Feedback for Audio-Lyrics Retrieval

D Zhou, F Lei, L Li, Y Zhou… - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
The task of retrieving audio content relevant to lyric queries and vice versa plays a critical
role in music-oriented applications. In this process, robust feature representations have to be …