Towards Robust Visual Understanding: from Recognition to Reasoning

T Gokhale - Proceedings of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Abstract Models that learn from data are widely and rapidly being deployed today for real-
world use, but they suffer from unforeseen failures due to distribution shift, adversarial …

VisionCam: A Comprehensive XAI Toolkit for Interpreting Image-Based Deep Learning Models

W Abdullah, A Tolba, A Elmasry… - Sustainable Machine …, 2024 - sciencesforce.com
Artificial intelligence (AI), a rapidly developing technology, has revolutionized various
aspects of our lives. However many AI models' complex inner workings are still unknown …

Drml: Diagnosing and rectifying vision models using language

Y Zhang, JZ HaoChen, SC Huang… - NeurIPS ML Safety …, 2022 - openreview.net
Recent multi-modal contrastive learning models have demonstrated the ability to learn an
embedding space suitable for building strong vision classifiers, by leveraging the rich …

AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

W An, F Tian, S Leng, J Nie, H Lin, QY Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite their great success across various multimodal tasks, Large Vision-Language
Models (LVLMs) are facing a prevalent problem with object hallucinations, where the …

Mitigating object hallucinations in large vision-language models through visual contrastive decoding

S Leng, H Zhang, G Chen, X Li, S Lu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Large Vision-Language Models (LVLMs) have advanced considerably intertwining
visual recognition and language understanding to generate content that is not only coherent …

Comparison Visual Instruction Tuning

W Lin, MJ Mirza, S Doveh, R Feris, R Giryes… - arXiv preprint arXiv …, 2024 - arxiv.org
Comparing two images in terms of Commonalities and Differences (CaD) is a fundamental
human capability that forms the basis of advanced visual reasoning and interpretation. It is …

Improving Just Noticeable Difference Model by Leveraging Temporal HVS Perception Characteristics

H Yin, Y Xing, G Xia, X Huang, C Yan - … , South Korea, January 5–8, 2020 …, 2020 - Springer
Temporal HVS characteristics are not fully exploited in conventional JND models. In this
paper, we improve the spatio-temporal JND model by fully leveraging the temporal HVS …

Diffusion-based Visual Counterfactual Explanations--Towards Systematic Quantitative Evaluation

P Vaeth, AM Fruehwald, B Paassen… - arXiv preprint arXiv …, 2023 - arxiv.org
Latest methods for visual counterfactual explanations (VCE) harness the power of deep
generative models to synthesize new examples of high-dimensional images of impressive …

[PDF][PDF] Diffusion-based Visual Counterfactual Explanations-Towards Systematic Quantitative Evaluation

AM Frühwald, B Paassen, M Gregorova - xkdd2023.isti.cnr.it
Latest methods for visual counterfactual explanations (VCE) harness the power of deep
generative models to synthesize new examples of high-dimensional images of impressive …

Cocot: Contrastive chain-of-thought prompting for large multimodal models with multiple image inputs

D Zhang, J Yang, H Lyu, Z Jin, Y Yao, M Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
When exploring the development of Artificial General Intelligence (AGI), a critical task for
these models involves interpreting and processing information from multiple image inputs …