Learning tfidf enhanced joint embedding for recipe-image cross-modal retrieval service

M Shukor, G Couairon, A Grechka… - Proceedings of the …, 2022 - openaccess.thecvf.com

Cross-modal image-recipe retrieval has gained significant attention in recent years. Most
work focuses on improving cross-modal embeddings using unimodal encoders, that allow …

被引用次数：17 相关文章所有 7 个版本

[PDF] archive.org

Improving Cross-Modal Recipe Retrieval with Component-Aware Prompted CLIP Embedding

X Huang, J Liu, Z Zhang, Y Xie - Proceedings of the 31st ACM …, 2023 - dl.acm.org

Cross-modal recipe retrieval is an emerging visual-textual retrieval task, which aims at
matching food images with the corresponding recipes. Although large-scale Vision …

被引用次数：3 相关文章所有 2 个版本

[PDF] thecvf.com

Fine-Grained Alignment for Cross-Modal Recipe Retrieval

M Wahed, X Zhou, T Yu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Vision-language pre-trained models have exhibited significant advancements in various
multimodal and unimodal tasks in recent years, including cross-modal recipe retrieval …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

Vision and structured-language pretraining for cross-modal food retrieval

M Shukor, N Thome, M Cord - Computer Vision and Image Understanding, 2024 - Elsevier

Abstract Vision-Language Pretraining (VLP) and Foundation models have been the go-to
recipe for achieving SoTA performance on general benchmarks. However, leveraging these …

被引用次数：3 相关文章所有 3 个版本

[PDF] tandfonline.com

Exploring latent weight factors and global information for food-oriented cross-modal retrieval

W Zhao, D Zhou, B Cao, W Liang, N Sukhija - Connection Science, 2023 - Taylor & Francis

Food-oriented cross-modal retrieval aims to retrieve relevant recipes given food images or
vice versa. The modality semantic gap between recipes and food images (text and image …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

CAR: consolidation, augmentation and regulation for recipe retrieval

F Song, B Zhu, Y Hao, S Wang, X He - arXiv preprint arXiv:2312.04763, 2023 - arxiv.org

Learning recipe and food image representation in common embedding space is non-trivial
but crucial for cross-modal recipe retrieval. In this paper, we propose CAR framework with …

被引用次数：2 相关文章所有 2 个版本

Cross-modal Recipe Retrieval with Fine-grained Prompting Alignment and Evidential Semantic Consistency

X Huang, J Liu, Z Zhang, Y Xie, Y Tang… - IEEE Transactions …, 2024 - ieeexplore.ieee.org

Alignment between the food images and the corresponding recipes is an emerging cross-
modal representation learning task. In this task, the recipes are composed of three …

Form generative approach for front face design of electric vehicle under female aesthetic preferences

B Yuan, K Wu, X Wu, C Yang - Advanced Engineering Informatics, 2024 - Elsevier

Vehicles are the most representative product of both transportation and industry. Fueled by
the growing popularity of energy-saving and environmentally friendly ideas and policies …

[PDF] mdpi.com

Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval

Z Zou, X Zhu, Q Zhu, H Zhang, L Zhu - Foods, 2024 - mdpi.com

As a prominent topic in food computing, cross-modal recipe retrieval has garnered
substantial attention. However, the semantic alignment across food images and recipes …

Video Frame-wise Explanation Driven Contrastive Learning for Procedural Text Generation

Z Wang, L Li, Z Xie, C Liu - Computer Vision and Image Understanding, 2024 - Elsevier

Procedural text generation from visual observation of instructional videos, such as
assembling, biochemical experiments, and cooking, is an essential task for scene …

高级搜索

QQ 群