相关文章- 学术资源搜索

Towards Robust Visual Understanding: from Recognition to Reasoning

T Gokhale - Proceedings of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org

Abstract Models that learn from data are widely and rapidly being deployed today for real-
world use, but they suffer from unforeseen failures due to distribution shift, adversarial …

[PDF] sciencesforce.com

VisionCam: A Comprehensive XAI Toolkit for Interpreting Image-Based Deep Learning Models

W Abdullah, A Tolba, A Elmasry… - Sustainable Machine …, 2024 - sciencesforce.com

Artificial intelligence (AI), a rapidly developing technology, has revolutionized various
aspects of our lives. However many AI models' complex inner workings are still unknown …

Drml: Diagnosing and rectifying vision models using language

Y Zhang, JZ HaoChen, SC Huang… - NeurIPS ML Safety …, 2022 - openreview.net

Recent multi-modal contrastive learning models have demonstrated the ability to learn an
embedding space suitable for building strong vision classifiers, by leveraging the rich …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

W An, F Tian, S Leng, J Nie, H Lin, QY Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Despite their great success across various multimodal tasks, Large Vision-Language
Models (LVLMs) are facing a prevalent problem with object hallucinations, where the …

[PDF] thecvf.com

Mitigating object hallucinations in large vision-language models through visual contrastive decoding

S Leng, H Zhang, G Chen, X Li, S Lu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Large Vision-Language Models (LVLMs) have advanced considerably intertwining
visual recognition and language understanding to generate content that is not only coherent …

被引用次数：41 相关文章所有 3 个版本

[PDF] arxiv.org

Comparison Visual Instruction Tuning

W Lin, MJ Mirza, S Doveh, R Feris, R Giryes… - arXiv preprint arXiv …, 2024 - arxiv.org

Comparing two images in terms of Commonalities and Differences (CaD) is a fundamental
human capability that forms the basis of advanced visual reasoning and interpretation. It is …

Improving Just Noticeable Difference Model by Leveraging Temporal HVS Perception Characteristics

H Yin, Y Xing, G Xia, X Huang, C Yan - … , South Korea, January 5–8, 2020 …, 2020 - Springer

Temporal HVS characteristics are not fully exploited in conventional JND models. In this
paper, we improve the spatio-temporal JND model by fully leveraging the temporal HVS …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Diffusion-based Visual Counterfactual Explanations--Towards Systematic Quantitative Evaluation

P Vaeth, AM Fruehwald, B Paassen… - arXiv preprint arXiv …, 2023 - arxiv.org

Latest methods for visual counterfactual explanations (VCE) harness the power of deep
generative models to synthesize new examples of high-dimensional images of impressive …

被引用次数：1 相关文章所有 4 个版本

[PDF] cnr.it

[PDF][PDF] Diffusion-based Visual Counterfactual Explanations-Towards Systematic Quantitative Evaluation

AM Frühwald, B Paassen, M Gregorova - xkdd2023.isti.cnr.it

Latest methods for visual counterfactual explanations (VCE) harness the power of deep
generative models to synthesize new examples of high-dimensional images of impressive …

[PDF] arxiv.org

Cocot: Contrastive chain-of-thought prompting for large multimodal models with multiple image inputs

D Zhang, J Yang, H Lyu, Z Jin, Y Yao, M Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

When exploring the development of Artificial General Intelligence (AGI), a critical task for
these models involves interpreting and processing information from multiple image inputs …

被引用次数：12 相关文章所有 2 个版本

高级搜索

QQ 群

Towards Robust Visual Understanding: from Recognition to Reasoning

VisionCam: A Comprehensive XAI Toolkit for Interpreting Image-Based Deep Learning Models

Drml: Diagnosing and rectifying vision models using language

AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

Mitigating object hallucinations in large vision-language models through visual contrastive decoding

Comparison Visual Instruction Tuning

Improving Just Noticeable Difference Model by Leveraging Temporal HVS Perception Characteristics

Diffusion-based Visual Counterfactual Explanations--Towards Systematic Quantitative Evaluation

[PDF][PDF] Diffusion-based Visual Counterfactual Explanations-Towards Systematic Quantitative Evaluation

Cocot: Contrastive chain-of-thought prompting for large multimodal models with multiple image inputs

引用