Interpretable deep learning: Interpretation, interpretability, trustworthiness, and beyond

X Li, H Xiong, X Li, X Wu, X Zhang, J Liu, J Bian… - … and Information Systems, 2022 - Springer
Deep neural networks have been well-known for their superb handling of various machine
learning and artificial intelligence tasks. However, due to their over-parameterized black-box …

Interpretable and explainable machine learning: a methods‐centric overview with concrete examples

R Marcinkevičs, JE Vogt - Wiley Interdisciplinary Reviews: Data …, 2023 - Wiley Online Library
Interpretability and explainability are crucial for machine learning (ML) and statistical
applications in medicine, economics, law, and natural sciences and form an essential …

Transformer interpretability beyond attention visualization

H Chefer, S Gur, L Wolf - … of the IEEE/CVF conference on …, 2021 - openaccess.thecvf.com
Self-attention techniques, and specifically Transformers, are dominating the field of text
processing and are becoming increasingly popular in computer vision classification tasks. In …

Backdoorbench: A comprehensive benchmark of backdoor learning

B Wu, H Chen, M Zhang, Z Zhu, S Wei… - Advances in …, 2022 - proceedings.neurips.cc
Backdoor learning is an emerging and vital topic for studying deep neural networks'
vulnerability (DNNs). Many pioneering backdoor attack and defense methods are being …

Generic attention-model explainability for interpreting bi-modal and encoder-decoder transformers

H Chefer, S Gur, L Wolf - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Transformers are increasingly dominating multi-modal reasoning tasks, such as visual
question answering, achieving state-of-the-art results thanks to their ability to contextualize …

Diffusion visual counterfactual explanations

M Augustin, V Boreiko, F Croce… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract Visual Counterfactual Explanations (VCEs) are an important tool to understand the
decisions of an image classifier. They are “small” but “realistic” semantic changes of the …

Which explanation should i choose? a function approximation perspective to characterizing post hoc explanations

T Han, S Srinivas, H Lakkaraju - Advances in neural …, 2022 - proceedings.neurips.cc
A critical problem in the field of post hoc explainability is the lack of a common foundational
goal among methods. For example, some methods are motivated by function approximation …

XAI for transformers: Better explanations through conservative propagation

A Ali, T Schnake, O Eberle… - International …, 2022 - proceedings.mlr.press
Transformers have become an important workhorse of machine learning, with numerous
applications. This necessitates the development of reliable methods for increasing their …

ISTVT: interpretable spatial-temporal video transformer for deepfake detection

C Zhao, C Wang, G Hu, H Chen, C Liu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
With the rapid development of Deepfake synthesis technology, our information security and
personal privacy have been severely threatened in recent years. To achieve a robust …

Impossibility theorems for feature attribution

B Bilodeau, N Jaques, PW Koh… - Proceedings of the …, 2024 - National Acad Sciences
Despite a sea of interpretability methods that can produce plausible explanations, the field
has also empirically seen many failure cases of such methods. In light of these results, it …