Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the fields of computer vision and natural language processing that requires a computer to output …
Answering visual queries is a complex task that requires both visual processing and reasoning. End-to-end models, the dominant approach for this task, do not explicitly …
Artificial intelligence (AI) models based on deep learning now represent the state of the art for making functional predictions in genomics research. However, the underlying basis on …
C Huang, O Mees, A Zeng… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Grounding language to the visual observations of a navigating agent can be performed using off-the-shelf visual-language models pretrained on Internet-scale data (eg, image …
S Menon, C Vondrick - arXiv preprint arXiv:2210.07183, 2022 - arxiv.org
Vision-language models (VLMs) such as CLIP have shown promising performance on a variety of recognition tasks using the standard zero-shot classification procedure--computing …
B Fang, J Yu, Z Chen, AI Osman, M Farghali… - Environmental …, 2023 - Springer
The rising amount of waste generated worldwide is inducing issues of pollution, waste management, and recycling, calling for new strategies to improve the waste ecosystem, such …
Thanks to the exponential growth in computing power and vast amounts of data, artificial intelligence (AI) has witnessed remarkable developments in recent years, enabling it to be …
It is a challenging task to learn discriminative representation from images and videos, due to large local redundancy and complex global dependency in these visual data. Convolution …
Purpose To demonstrate the value of pretraining with millions of radiologic images compared with ImageNet photographic images on downstream medical applications when …