The Evolution of Multimodal Model Architectures

SN Wadekar, A Chaurasia, A Chadha… - arXiv preprint arXiv …, 2024 - arxiv.org
This work uniquely identifies and characterizes four prevalent multimodal model
architectural patterns in the contemporary multimodal landscape. Systematically …

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

S Yang, Z Zhong, M Zhao, S Takahashi, M Ishii… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, with the realistic generation results and a wide range of personalized
applications, diffusion-based generative models gain huge attention in both visual and …