P Xu, W Shao, K Zhang, P Gao, S Liu… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Large Vision-Language Models (LVLMs) have recently played a dominant role in multimodal vision-language learning. Despite the great success, it lacks a holistic evaluation …
Autonomous driving is an active research topic in both academia and industry. However, most of the existing solutions focus on improving the accuracy by training learnable models …
Vision-language pre-training (VLP) has shown impressive performance on a wide range of cross-modal tasks, where VLP models without reliance on object detectors are becoming the …
S Lu, Z Liu, T Liu, W Zhou - Engineering Applications of Artificial …, 2023 - Elsevier
Abstract Medical Vision-and-Language Pre-training (MedVLP), which learns generic vision- language representations from medical images and texts to benefit various downstream …
Visual Question Answering (VQA) is a challenging task that requires systems to provide accurate answers to questions based on image content. Current VQA models struggle with …
Visual reasoning is dominated by end-to-end neural networks scaled to billions of model parameters and training examples. However, even the largest models struggle with …
The intersection of vision and language is of major interest due to the increased focus on seamless integration between recognition and reasoning. Scene graphs (SGs) have …
X Shi, S Lee - Proceedings of the IEEE/CVF Winter …, 2024 - openaccess.thecvf.com
When faced with an out-of-distribution (OOD) question or image, visual question answering (VQA) systems may provide unreliable answers. If relied on by real users or secondary …