Z Hu,
S Li, M Du,
A Dhua… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
In e-commerce applications vision-language multimodal transformer models play a pivotal
role in product search. The key to successfully training a multimodal model lies in the …