X Huang, J Liu, Z Zhang, Y Xie - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Cross-modal recipe retrieval is an emerging visual-textual retrieval task, which aims at matching food images with the corresponding recipes. Although large-scale Vision …
M Wahed, X Zhou, T Yu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Vision-language pre-trained models have exhibited significant advancements in various multimodal and unimodal tasks in recent years, including cross-modal recipe retrieval …
Z Zou, X Zhu, Q Zhu, Y Liu, L Zhu - IEEE Access, 2024 - ieeexplore.ieee.org
State-of-the-art methods for cross-modal recipe retrieval failed to consider an underlying but challenging issue, ie, matching imperfectly problem hidden in positive image-recipe pairs …
M Shukor, N Thome, M Cord - Computer Vision and Image Understanding, 2024 - Elsevier
Abstract Vision-Language Pretraining (VLP) and Foundation models have been the go-to recipe for achieving SoTA performance on general benchmarks. However, leveraging these …
Analyzing memes on the internet has emerged as a crucial endeavor due to the impact this multi-modal form of content wields in shaping online discourse. Memes have become a …
X Huang, J Liu, Z Zhang, Y Xie, Y Tang… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Alignment between the food images and the corresponding recipes is an emerging cross- modal representation learning task. In this task, the recipes are composed of three …
Z Zou, X Zhu, Q Zhu, H Zhang, L Zhu - Foods, 2024 - mdpi.com
As a prominent topic in food computing, cross-modal recipe retrieval has garnered substantial attention. However, the semantic alignment across food images and recipes …
D Zhou, F Lei, L Li, Y Zhou… - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
The task of retrieving audio content relevant to lyric queries and vice versa plays a critical role in music-oriented applications. In this process, robust feature representations have to be …