R Chawla, A Datta, T Verma, A Jha, A Gautam… - arXiv preprint arXiv …, 2024 - arxiv.org
Lately, researchers in artificial intelligence have been really interested in how language and
vision come together, giving rise to the development of multimodal models that aim to …