所有版本 - 学术资源搜索

文章

学术资源搜索

获得 3 条结果（用时0.02秒）

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action

J Lu, C Clark, S Lee, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present Unified-IO 2 a multimodal and multi-skill unified model capable of following
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …

被引用次数：37 相关文章

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

J Lu, C Clark, S Lee, Z Zhang, S Khosla… - arXiv e …, 2023 - ui.adsabs.harvard.edu

We present Unified-IO 2, the first autoregressive multimodal model that is capable of
understanding and generating image, text, audio, and action. To unify different modalities …

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

J Lu, C Clark, S Lee, Z Zhang, S Khosla… - arXiv preprint arXiv …, 2023 - arxiv.org

We present Unified-IO 2, the first autoregressive multimodal model that is capable of
understanding and generating image, text, audio, and action. To unify different modalities …

高级搜索

QQ 群

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

引用