VL-Few: Vision Language Alignment for Multimodal Few-Shot Meta Learning

H Ma, B Fan, BK Ng, CT Lam - Applied Sciences, 2024 - mdpi.com
Complex tasks in the real world involve different modal models, such as visual question
answering (VQA). However, traditional multimodal learning requires a large amount of …