A survey on causal inference

L Yao, Z Chu, S Li, Y Li, J Gao, A Zhang - ACM Transactions on …, 2021 - dl.acm.org
Causal inference is a critical research topic across many domains, such as statistics,
computer science, education, public policy, and economics, for decades. Nowadays …

Teaching structured vision & language concepts to vision & language models

S Doveh, A Arbelle, S Harary… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision and Language (VL) models have demonstrated remarkable zero-shot performance in
a variety of tasks. However, some aspects of complex language understanding still remain a …

Learning de-biased representations with biased representations

H Bahng, S Chun, S Yun, J Choo… - … on Machine Learning, 2020 - proceedings.mlr.press
Many machine learning algorithms are trained and evaluated by splitting data from a single
source into training and test sets. While such focus on in-distribution learning scenarios has …

Visual language integration: A survey and open challenges

SM Park, YG Kim - Computer Science Review, 2023 - Elsevier
With the recent development of deep learning technology comes the wide use of artificial
intelligence (AI) models in various domains. AI shows good performance for definite …

Benchmarking spatial relationships in text-to-image generation

T Gokhale, H Palangi, B Nushi, V Vineet… - arXiv preprint arXiv …, 2022 - arxiv.org
Spatial understanding is a fundamental aspect of computer vision and integral for human-
level reasoning about images, making it an important component for grounded language …

Biaswap: Removing dataset bias with bias-tailored swapping augmentation

E Kim, J Lee, J Choo - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Deep neural networks often make decisions based on the spurious correlations inherent in
the dataset, failing to generalize in an unbiased data distribution. Although previous …

Mutant: A training paradigm for out-of-distribution generalization in visual question answering

T Gokhale, P Banerjee, C Baral, Y Yang - arXiv preprint arXiv:2009.08566, 2020 - arxiv.org
While progress has been made on the visual question answering leaderboards, models
often utilize spurious correlations and priors in datasets under the iid setting. As such …

Enhancing self-consistency and performance of pre-trained language models through natural language inference

E Mitchell, JJ Noh, S Li, WS Armstrong… - arXiv preprint arXiv …, 2022 - arxiv.org
While large pre-trained language models are powerful, their predictions often lack logical
consistency across test inputs. For example, a state-of-the-art Macaw question-answering …

Negative object presence evaluation (nope) to measure object hallucination in vision-language models

H Lovenia, W Dai, S Cahyawijaya, Z Ji… - arXiv preprint arXiv …, 2023 - arxiv.org
Object hallucination poses a significant challenge in vision-language (VL) models, often
leading to the generation of nonsensical or unfaithful responses with non-existent objects …

Vqamix: Conditional triplet mixup for medical visual question answering

H Gong, G Chen, M Mao, Z Li… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Medical visual question answering (VQA) aims to correctly answer a clinical question related
to a given medical image. Nevertheless, owing to the expensive manual annotations of …