Transformer decoders with multimodal regularization for cross-modal food retrieval

M Shukor, G Couairon, A Grechka… - Proceedings of the …, 2022 - openaccess.thecvf.com
Cross-modal image-recipe retrieval has gained significant attention in recent years. Most
work focuses on improving cross-modal embeddings using unimodal encoders, that allow …

Improving Cross-Modal Recipe Retrieval with Component-Aware Prompted CLIP Embedding

X Huang, J Liu, Z Zhang, Y Xie - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Cross-modal recipe retrieval is an emerging visual-textual retrieval task, which aims at
matching food images with the corresponding recipes. Although large-scale Vision …

Fine-Grained Alignment for Cross-Modal Recipe Retrieval

M Wahed, X Zhou, T Yu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Vision-language pre-trained models have exhibited significant advancements in various
multimodal and unimodal tasks in recent years, including cross-modal recipe retrieval …

Vision and structured-language pretraining for cross-modal food retrieval

M Shukor, N Thome, M Cord - Computer Vision and Image Understanding, 2024 - Elsevier
Abstract Vision-Language Pretraining (VLP) and Foundation models have been the go-to
recipe for achieving SoTA performance on general benchmarks. However, leveraging these …

Exploring latent weight factors and global information for food-oriented cross-modal retrieval

W Zhao, D Zhou, B Cao, W Liang, N Sukhija - Connection Science, 2023 - Taylor & Francis
Food-oriented cross-modal retrieval aims to retrieve relevant recipes given food images or
vice versa. The modality semantic gap between recipes and food images (text and image …

CAR: consolidation, augmentation and regulation for recipe retrieval

F Song, B Zhu, Y Hao, S Wang, X He - arXiv preprint arXiv:2312.04763, 2023 - arxiv.org
Learning recipe and food image representation in common embedding space is non-trivial
but crucial for cross-modal recipe retrieval. In this paper, we propose CAR framework with …

Cross-modal Recipe Retrieval with Fine-grained Prompting Alignment and Evidential Semantic Consistency

X Huang, J Liu, Z Zhang, Y Xie, Y Tang… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Alignment between the food images and the corresponding recipes is an emerging cross-
modal representation learning task. In this task, the recipes are composed of three …

Form generative approach for front face design of electric vehicle under female aesthetic preferences

B Yuan, K Wu, X Wu, C Yang - Advanced Engineering Informatics, 2024 - Elsevier
Vehicles are the most representative product of both transportation and industry. Fueled by
the growing popularity of energy-saving and environmentally friendly ideas and policies …

Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval

Z Zou, X Zhu, Q Zhu, H Zhang, L Zhu - Foods, 2024 - mdpi.com
As a prominent topic in food computing, cross-modal recipe retrieval has garnered
substantial attention. However, the semantic alignment across food images and recipes …

Video Frame-wise Explanation Driven Contrastive Learning for Procedural Text Generation

Z Wang, L Li, Z Xie, C Liu - Computer Vision and Image Understanding, 2024 - Elsevier
Procedural text generation from visual observation of instructional videos, such as
assembling, biochemical experiments, and cooking, is an essential task for scene …