Towards vision enhancing llms: Empowering multimodal knowledge storage and sharing in llms

Y Li, B Hu, W Wang, X Cao, M Zhang - arXiv preprint arXiv:2311.15759, 2023 - arxiv.org
Recent advancements in multimodal large language models (MLLMs) have achieved
significant multimodal generation capabilities, akin to GPT-4. These models predominantly …

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

S Tong, E Brown, P Wu, S Woo, M Middepogu… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-
centric approach. While stronger language models can enhance multimodal capabilities, the …

Shared cross-modal trajectory prediction for autonomous driving

C Choi, JH Choi, J Li, S Malla - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Predicting future trajectories of traffic agents in highly interactive environments is an
essential and challenging problem for the safe operation of autonomous driving systems. On …

Large language model supply chain: A research agenda

S Wang, Y Zhao, X Hou, H Wang - arXiv preprint arXiv:2404.12736, 2024 - arxiv.org
The rapid advancements in pre-trained Large Language Models (LLMs) and Large
Multimodal Models (LMMs) have ushered in a new era of intelligent applications …

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

D Liu, R Zhang, L Qiu, S Huang, W Lin, S Zhao… - Forty-first International … - openreview.net
We propose SPHINX-X, an extensive Multi-modality Large Language Model (MLLM) series
developed upon SPHINX. To improve the architecture and training efficiency, we modify the …

Vision language models in autonomous driving: A survey and outlook

X Zhou, M Liu, E Yurtsever, BL Zagar… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
The applications of Vision-Language Models (VLMs) in the field of Autonomous Driving (AD)
have attracted widespread attention due to their outstanding performance and the ability to …

Scalability in perception for autonomous driving: Waymo open dataset

P Sun, H Kretzschmar, X Dotiwalla… - Proceedings of the …, 2020 - openaccess.thecvf.com
The research community has increasing interest in autonomous driving research, despite
the resource intensity of obtaining representative real world data. Existing self-driving …

From Image to Video, what do we need in multimodal LLMs?

S Huang, H Zhang, Y Gao, Y Hu, Z Qin - arXiv preprint arXiv:2404.11865, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) have demonstrated profound capabilities in
understanding multimodal information, covering from Image LLMs to the more complex …

Mllm-bench, evaluating multi-modal llms using gpt-4v

W Ge, S Chen, G Chen, J Chen, Z Chen, S Yan… - arXiv preprint arXiv …, 2023 - arxiv.org
In the pursuit of Artificial General Intelligence (AGI), the integration of vision in language
models has marked a significant milestone. The advent of vision-language models (MLLMs) …

The curious case of nonverbal abstract reasoning with multi-modal large language models

K Ahrabian, Z Sourati, K Sun, J Zhang, Y Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org
While large language models (LLMs) are still being adopted to new domains and utilized in
novel applications, we are experiencing an influx of the new generation of foundation …