A Suhr, M Lewis, J Yeh, Y Artzi - … of the 55th Annual Meeting of the …, 2017 - aclanthology.org
We present a new visual reasoning language dataset, containing 92,244 pairs of examples of natural statements grounded in synthetic images with 3,962 unique sentences. We …
Natural language rationales could provide intuitive, higher-level explanations that are easily understandable by humans, complementing the more broadly studied lower-level …
R Saqur, K Narasimhan - Advances in Neural Information …, 2020 - proceedings.neurips.cc
Compositional generalization is a key challenge in grounding natural language to visual perception. While deep learning models have achieved great success in multimodal tasks …
Recent visuolinguistic pre-trained models show promising progress on various end tasks such as image retrieval and video captioning. Yet, they fail miserably on the recently …
Existing visual reasoning datasets such as Visual Question Answering (VQA), often suffer from biases conditioned on the question, image or answer distributions. The recently …
Recurrent neural networks have recently been used for learning to describe images using natural language. However, it has been observed that these models generalize poorly to …
We present a novel task and dataset for evaluating the ability of vision and language models to conduct visio-linguistic compositional reasoning, which we call Winoground. Given two …
S Gu, C Clark, A Kembhavi - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Many high-level skills that are required for computer vision tasks, such as parsing questions, comparing and contrasting semantics, and writing descriptions, are also required in other …
F Sammani, T Mukherjee… - proceedings of the …, 2022 - openaccess.thecvf.com
Natural language explanation (NLE) models aim at explaining the decision-making process of a black box system via generating natural language sentences which are human-friendly …