Connecting text and visual modalities plays an essential role in generative intelligence. For this reason, inspired by the success of large language models, significant research efforts …
This paper explores the emerging knowledge-driven autonomous driving technologies. Our investigation highlights the limitations of current autonomous driving systems, in particular …
The rise of large foundation models, trained on extensive datasets, is revolutionizing the field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by …
The revolutionary capabilities of large language models (LLMs) have paved the way for multimodal large language models (MLLMs) and fostered diverse applications across …
Robots powered by'blackbox'models need to provide human-understandable explanations which we can trust. Hence, explainability plays a critical role in trustworthy autonomous …
The emergence of Multimodal Large Language Models ((M) LLMs) has ushered in new avenues in artificial intelligence, particularly for autonomous driving by offering enhanced …
S Luo, W Chen, W Tian, R Liu, L Hou… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Foundation models have indeed made a profound impact on various fields, emerging as pivotal components that significantly shape the capabilities of intelligent systems. In the …
Vision language models (VLMs) are an exciting emerging class of language models (LMs) that have merged classic LM capabilities with those of image processing systems. However …
C Pang, J Wu, J Li, Y Liu, J Sun, W Li, X Weng… - arXiv preprint arXiv …, 2024 - arxiv.org
The generic large Vision-Language Models (VLMs) is rapidly developing, but still perform poorly in Remote Sensing (RS) domain, which is due to the unique and specialized nature …