JY Koh, R Salakhutdinov, D Fried - Proceedings of the 40th International …, 2023 - dl.acm.org
We propose an efficient method to ground pretrained text-only language models to the
visual domain, enabling them to process arbitrarily interleaved image-and-text data, and …