H Sakaino - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com
Abstract Vision-Language models (VLMs), ie, image-text pairs of CLIP, have boosted image-
based Deep Learning (DL). Moreover, Visual-Question-Answer (VQA) tools and open …