S Chowdhury, S Nag, S Dasgupta, J Chen… - … on Computer Vision, 2024 - Springer
Abstract Leveraging Large Language Models' remarkable proficiency in text-based tasks,
recent works on Multi-modal LLMs (MLLMs) extend them to other modalities like vision and …