J Lu, C Clark, S Lee, Z Zhang, S Khosla… - arXiv preprint arXiv …, 2023 - arxiv.org
We present Unified-IO 2, the first autoregressive multimodal model that is capable of
understanding and generating image, text, audio, and action. To unify different modalities …