The Evolution of Multimodal Model Architectures

SN Wadekar, A Chaurasia, A Chadha… - arXiv preprint arXiv …, 2024 - arxiv.org
This work uniquely identifies and characterizes four prevalent multimodal model
architectural patterns in the contemporary multimodal landscape. Systematically …

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arXiv preprint arXiv:2209.03430, 2022 - arxiv.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

MULTIZOO & MULTIBENCH: a standardized toolkit for multimodal deep learning

PP Liang, Y Lyu, X Fan, A Agarwal, Y Cheng… - The Journal of Machine …, 2023 - dl.acm.org
Learning multimodal representations involves integrating information from multiple
heterogeneous sources of data. In order to accelerate progress towards understudied …

TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models

J Jia, Y Hu, X Weng, Y Shi, M Li, X Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
We present TinyLLaVA Factory, an open-source modular codebase for small-scale large
multimodal models (LMMs) with a focus on simplicity of code implementations, extensibility …

Leveraging hierarchy in multimodal generative models for effective cross-modality inference

M Vasco, H Yin, FS Melo, A Paiva - Neural Networks, 2022 - Elsevier
This work addresses the problem of cross-modality inference (CMI), ie, inferring missing
data of unavailable perceptual modalities (eg, sound) using data from available perceptual …

Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities

S Munikoti, I Stewart, S Horawalavithana… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal models are expected to be a critical component to future advances in artificial
intelligence. This field is starting to grow rapidly with a surge of new design elements …

Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2023 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Building the Next Generation of Multimodal Models

G Ilharco - 2024 - digital.lib.washington.edu
One of the fundamental goals of machine learning is to create systems capable of
processing data from a variety of modalities such as images and text. I argue that the next …

[HTML][HTML] Multibench: Multiscale benchmarks for multimodal representation learning

PP Liang, Y Lyu, X Fan, Z Wu, Y Cheng… - Advances in neural …, 2021 - ncbi.nlm.nih.gov
Learning multimodal representations involves integrating information from multiple
heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world …

Multimodal Infusion Tuning for Large Models

H Sun, Y Song, J Hu, X Yu, J Liu, YW Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in large-scale models have showcased remarkable generalization
capabilities in various tasks. However, integrating multimodal processing into these models …