C You,
Y Mint, W Dai, JS Sekhon… - 2024 IEEE/CVF …, 2024 - ieeexplore.ieee.org
Fine-tuning pre-trained vision-language models, like CLIP, has yielded success on diverse
downstream tasks. However, several pain points persist for this paradigm:(i) directly tuning …