Visualizing Invariant Features in Vision Models

F Sammani, B Joukovsky… - 2023 24th International …, 2023 - ieeexplore.ieee.org
Explainable AI is important for improving transparency, accountability, trust, and ethical
considerations in AI systems, and for enabling users to make informed decisions based on …

Boosting Cross-task Transferability of Adversarial Patches with Visual Relations

T Ma, S Li, Y Xiao, S Liu - arXiv preprint arXiv:2304.05402, 2023 - arxiv.org
The transferability of adversarial examples is a crucial aspect of evaluating the robustness of
deep learning systems, particularly in black-box scenarios. Although several methods have …

Deem: Diffusion models serve as the eyes of large language models for image perception

R Luo, Y Li, L Chen, W He, TE Lin, Z Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
The development of large language models (LLMs) has significantly advanced the
emergence of large multimodal models (LMMs). While LMMs have achieved tremendous …

VGA: Vision GUI Assistant--Minimizing Hallucinations through Image-Centric Fine-Tuning

Z Meng, Y Dai, Z Gong, S Guo, M Tang… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advances in Large Vision-Language Models (LVLMs) have significantly improve
performance in image comprehension tasks, such as formatted charts and rich-content …

Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning

R Hu, Y Tu, J Sang - arXiv preprint arXiv:2404.10332, 2024 - arxiv.org
Despite achieving outstanding performance on various cross-modal tasks, current large
vision-language models (LVLMs) still suffer from hallucination issues, manifesting as …

Image Authenticity Detection using Eye Gazing Data: A Performance Comparison Beyond Human Capabilities via Attention Mechanism, ResNet, and Cascade …

P Zhang - openreview.net
In the digital age, determining the authenticity of images has become increasingly crucial.
This study aims to explore the capability of machine learning models in identifying …

Detecting adversarial perturbations in multi-task perception

M Klingner, VR Kumar, S Yogamani… - 2022 IEEE/RSJ …, 2022 - ieeexplore.ieee.org
While deep neural networks (DNNs) achieve impressive performance on environment
perception tasks, their sensitivity to adversarial perturbations limits their use in practical …

Detecting and preventing hallucinations in large vision language models

A Gunjal, J Yin, E Bas - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
Instruction tuned Large Vision Language Models (LVLMs) have significantly advanced in
generalizing across a diverse set of multi-modal tasks, especially for Visual Question …

Ibd: Alleviating hallucinations in large vision-language models via image-biased decoding

L Zhu, D Ji, T Chen, P Xu, J Ye, J Liu - arXiv preprint arXiv:2402.18476, 2024 - arxiv.org
Despite achieving rapid developments and with widespread applications, Large Vision-
Language Models (LVLMs) confront a serious challenge of being prone to generating …

Hal-eval: A universal and fine-grained hallucination evaluation framework for large vision language models

C Jiang, W Ye, M Dong, H Jia, H Xu, M Yan… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Vision Language Models exhibit remarkable capabilities but struggle with
hallucinations inconsistencies between images and their descriptions. Previous …