J Jiang, N Zheng - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Recently, finetuning pretrained vision-language models (VLMs) has been a prevailing
paradigm for achieving state-of-the-art performance in VQA. However, as VLMs scale, it …