Within the vast expanse of computerized language processing, a revolutionary entity known as Large Language Models (LLMs) has emerged, wielding immense power in its capacity to …
H Zhang, X Li, L Bing - arXiv preprint arXiv:2306.02858, 2023 - arxiv.org
We present Video-LLaMA, a multi-modal framework that empowers Large Language Models (LLMs) with the capability of understanding both visual and auditory content in the video …
Based on powerful Large Language Models (LLMs), recent generative Multimodal Large Language Models (MLLMs) have gained prominence as a pivotal research area, exhibiting …
Neural compression is the application of neural networks and other machine learning methods to data compression. Recent advances in statistical machine learning have opened …
K Li, Y Wang, Y He, Y Li, Y Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
With the rapid development of Multi-modal Large Language Models (MLLMs) a number of diagnostic benchmarks have recently emerged to evaluate the comprehension capabilities …
We present Unified-IO 2 a multimodal and multi-skill unified model capable of following novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …
The Large Vision-Language Model (LVLM) has enhanced the performance of various downstream tasks in visual-language understanding. Most existing approaches encode …
B Li, Y Ge, Y Ge, G Wang, R Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Multimodal large language models (MLLMs) building upon the foundation of powerful large language models (LLMs) have recently demonstrated exceptional capabilities in generating …
In the past decade, autonomous driving has experienced rapid development in both academia and industry. However, its limited interpretability remains a significant unsolved …