Sunny and dark outside?! improving answer consistency in vqa through entailed question generation

L Yao, Z Chu, S Li, Y Li, J Gao, A Zhang - ACM Transactions on …, 2021 - dl.acm.org

Causal inference is a critical research topic across many domains, such as statistics,
computer science, education, public policy, and economics, for decades. Nowadays …

被引用次数：449 相关文章所有 6 个版本

[PDF] thecvf.com

Teaching structured vision & language concepts to vision & language models

S Doveh, A Arbelle, S Harary… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision and Language (VL) models have demonstrated remarkable zero-shot performance in
a variety of tasks. However, some aspects of complex language understanding still remain a …

被引用次数：42 相关文章所有 8 个版本

[PDF] mlr.press

Learning de-biased representations with biased representations

H Bahng, S Chun, S Yun, J Choo… - … on Machine Learning, 2020 - proceedings.mlr.press

Many machine learning algorithms are trained and evaluated by splitting data from a single
source into training and test sets. While such focus on in-distribution learning scenarios has …

被引用次数：270 相关文章所有 11 个版本

Visual language integration: A survey and open challenges

SM Park, YG Kim - Computer Science Review, 2023 - Elsevier

With the recent development of deep learning technology comes the wide use of artificial
intelligence (AI) models in various domains. AI shows good performance for definite …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Benchmarking spatial relationships in text-to-image generation

T Gokhale, H Palangi, B Nushi, V Vineet… - arXiv preprint arXiv …, 2022 - arxiv.org

Spatial understanding is a fundamental aspect of computer vision and integral for human-
level reasoning about images, making it an important component for grounded language …

被引用次数：45 相关文章所有 2 个版本

[PDF] thecvf.com

Biaswap: Removing dataset bias with bias-tailored swapping augmentation

E Kim, J Lee, J Choo - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

Deep neural networks often make decisions based on the spurious correlations inherent in
the dataset, failing to generalize in an unbiased data distribution. Although previous …

被引用次数：66 相关文章所有 6 个版本

[PDF] arxiv.org

Mutant: A training paradigm for out-of-distribution generalization in visual question answering

T Gokhale, P Banerjee, C Baral, Y Yang - arXiv preprint arXiv:2009.08566, 2020 - arxiv.org

While progress has been made on the visual question answering leaderboards, models
often utilize spurious correlations and priors in datasets under the iid setting. As such …

被引用次数：132 相关文章所有 9 个版本

[PDF] arxiv.org

Enhancing self-consistency and performance of pre-trained language models through natural language inference

E Mitchell, JJ Noh, S Li, WS Armstrong… - arXiv preprint arXiv …, 2022 - arxiv.org

While large pre-trained language models are powerful, their predictions often lack logical
consistency across test inputs. For example, a state-of-the-art Macaw question-answering …

被引用次数：36 相关文章所有 4 个版本

[PDF] arxiv.org

Negative object presence evaluation (nope) to measure object hallucination in vision-language models

H Lovenia, W Dai, S Cahyawijaya, Z Ji… - arXiv preprint arXiv …, 2023 - arxiv.org

Object hallucination poses a significant challenge in vision-language (VL) models, often
leading to the generation of nonsensical or unfaithful responses with non-existent objects …

被引用次数：22 相关文章所有 2 个版本

[PDF] google.com

Vqamix: Conditional triplet mixup for medical visual question answering

H Gong, G Chen, M Mao, Z Li… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Medical visual question answering (VQA) aims to correctly answer a clinical question related
to a given medical image. Nevertheless, owing to the expensive manual annotations of …

被引用次数：29 相关文章所有 4 个版本

高级搜索

QQ 群