Fame-vil: Multi-tasking vision-language model for heterogeneous fashion tasks

X Han, X Zhu, L Yu, L Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
In the fashion domain, there exists a variety of vision-and-language (V+ L) tasks, including
cross-modal retrieval, text-guided image retrieval, multi-modal classification, and image …

Fashionvil: Fashion-focused vision-and-language representation learning

X Han, L Yu, X Zhu, L Zhang, YZ Song… - European conference on …, 2022 - Springer
Abstract Large-scale Vision-and-Language (V+ L) pre-training for representation learning
has proven to be effective in boosting various downstream V+ L tasks. However, when it …

Fashion-GPT: Integrating LLMs with Fashion Retrieval System

Q Chen, T Zhang, M Nie, Z Wang, S Xu, W Shi… - Proceedings of the 1st …, 2023 - dl.acm.org
Customers on a fashion e-commerce platform although expressing their clothing
preferences through combined imagery and textual information, they are limited to retrieve …

[PDF][PDF] Benchmarking Robustness of Text-Image Composed Retrieval

S Sun, J Gu, S Gong - arXiv preprint arXiv:2311.14837, 2023 - suntongtongtong.github.io
Text-image composed retrieval aims to retrieve the target image through the composed
query, which is specified in the form of an image plus some text that describes desired …

PCaSM: Text-guided composed image retrieval with parallel content and style modules

J Zhang, J Zhang, H Wu, Z Zhao… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
The query for text-guided image retrieval references two parts: the first part is the image, and
the second part is the text describing the part of the image that needs to be modified. By …

Simplifying Referred Visual Search with Conditional Contrastive Learning

S Lepage, J Mary, D Picard - openreview.net
This paper introduces a new challenge for image similarity search in the context of fashion,
addressing the inherent ambiguity in this domain stemming from complex images. We …