相关文章- 学术资源搜索

Towards vision enhancing llms: Empowering multimodal knowledge storage and sharing in llms

Y Li, B Hu, W Wang, X Cao, M Zhang - arXiv preprint arXiv:2311.15759, 2023 - arxiv.org

Recent advancements in multimodal large language models (MLLMs) have achieved
significant multimodal generation capabilities, akin to GPT-4. These models predominantly …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

S Tong, E Brown, P Wu, S Woo, M Middepogu… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-
centric approach. While stronger language models can enhance multimodal capabilities, the …

Shared cross-modal trajectory prediction for autonomous driving

C Choi, JH Choi, J Li, S Malla - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Predicting future trajectories of traffic agents in highly interactive environments is an
essential and challenging problem for the safe operation of autonomous driving systems. On …

被引用次数：65 相关文章所有 10 个版本

[PDF] arxiv.org

Large language model supply chain: A research agenda

S Wang, Y Zhao, X Hou, H Wang - arXiv preprint arXiv:2404.12736, 2024 - arxiv.org

The rapid advancements in pre-trained Large Language Models (LLMs) and Large
Multimodal Models (LMMs) have ushered in a new era of intelligent applications …

被引用次数：1 相关文章所有 3 个版本

[PDF] openreview.net

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

D Liu, R Zhang, L Qiu, S Huang, W Lin, S Zhao… - Forty-first International … - openreview.net

We propose SPHINX-X, an extensive Multi-modality Large Language Model (MLLM) series
developed upon SPHINX. To improve the architecture and training efficiency, we modify the …

[PDF] ieee.org

Vision language models in autonomous driving: A survey and outlook

X Zhou, M Liu, E Yurtsever, BL Zagar… - IEEE Transactions …, 2024 - ieeexplore.ieee.org

The applications of Vision-Language Models (VLMs) in the field of Autonomous Driving (AD)
have attracted widespread attention due to their outstanding performance and the ability to …

被引用次数：2 相关文章所有 2 个版本

[PDF] thecvf.com

Scalability in perception for autonomous driving: Waymo open dataset

P Sun, H Kretzschmar, X Dotiwalla… - Proceedings of the …, 2020 - openaccess.thecvf.com

The research community has increasing interest in autonomous driving research, despite
the resource intensity of obtaining representative real world data. Existing self-driving …

被引用次数：2612 相关文章所有 7 个版本

[PDF] arxiv.org

From Image to Video, what do we need in multimodal LLMs?

S Huang, H Zhang, Y Gao, Y Hu, Z Qin - arXiv preprint arXiv:2404.11865, 2024 - arxiv.org

Multimodal Large Language Models (MLLMs) have demonstrated profound capabilities in
understanding multimodal information, covering from Image LLMs to the more complex …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Mllm-bench, evaluating multi-modal llms using gpt-4v

W Ge, S Chen, G Chen, J Chen, Z Chen, S Yan… - arXiv preprint arXiv …, 2023 - arxiv.org

In the pursuit of Artificial General Intelligence (AGI), the integration of vision in language
models has marked a significant milestone. The advent of vision-language models (MLLMs) …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

The curious case of nonverbal abstract reasoning with multi-modal large language models

K Ahrabian, Z Sourati, K Sun, J Zhang, Y Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org

While large language models (LLMs) are still being adopted to new domains and utilized in
novel applications, we are experiencing an influx of the new generation of foundation …

被引用次数：2 相关文章所有 2 个版本

高级搜索

QQ 群

Towards vision enhancing llms: Empowering multimodal knowledge storage and sharing in llms

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Shared cross-modal trajectory prediction for autonomous driving

Large language model supply chain: A research agenda

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Vision language models in autonomous driving: A survey and outlook

Scalability in perception for autonomous driving: Waymo open dataset

From Image to Video, what do we need in multimodal LLMs?

Mllm-bench, evaluating multi-modal llms using gpt-4v

The curious case of nonverbal abstract reasoning with multi-modal large language models

引用