" Is your explanation stable?" A Robustness Evaluation Framework for Feature Attribution

Y Gan, Y Mao, X Zhang, S Ji, Y Pu, M Han… - Proceedings of the …, 2022 - dl.acm.org
Neural networks have become increasingly popular. Nevertheless, understanding their
decision process turns out to be complicated. One vital method to explain a models' …

Smoothed geometry for robust attribution

Z Wang, H Wang, S Ramkumar… - Advances in neural …, 2020 - proceedings.neurips.cc
Feature attributions are a popular tool for explaining the behavior of Deep Neural Networks
(DNNs), but have recently been shown to be vulnerable to attacks that produce divergent …

A practical upper bound for the worst-case attribution deviations

F Wang, AWK Kong - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Abstract Model attribution is a critical component of deep neural networks (DNNs) for its
interpretability to complex models. Recent studies bring up attention to the security of …

Shortcomings of top-down randomization-based sanity checks for evaluations of deep neural network explanations

A Binder, L Weber, S Lapuschkin… - Proceedings of the …, 2023 - openaccess.thecvf.com
While the evaluation of explanations is an important step towards trustworthy models, it
needs to be done carefully, and the employed metrics need to be well-understood …

Enhanced regularizers for attributional robustness

A Sarkar, A Sarkar, VN Balasubramanian - Proceedings of the AAAI …, 2021 - ojs.aaai.org
Deep neural networks are the default choice of learning models for computer vision tasks.
Extensive work has been carried out in recent years on explaining deep models for vision …

[PDF][PDF] SoK: Explainable machine learning in adversarial environments

M Noppel, C Wressnegger - 2024 IEEE Symposium on …, 2023 - oaklandsok.github.io
Modern deep learning methods have long been considered black boxes due to the lack of
insights into their decision-making process. However, recent advances in explainable …

Benchmarking attribution methods with relative feature importance

M Yang, B Kim - arXiv preprint arXiv:1907.09701, 2019 - arxiv.org
Interpretability is an important area of research for safe deployment of machine learning
systems. One particular type of interpretability method attributes model decisions to input …

Good-looking but Lacking Faithfulness: Understanding Local Explanation Methods through Trend-based Testing

J He, K Chen, G Meng, J Zhang, C Li - Proceedings of the 2023 ACM …, 2023 - dl.acm.org
While enjoying the great achievements brought by deep learning (DL), people are also
worried about the decision made by DL models, since the high degree of non-linearity of DL …

Measurably stronger explanation reliability via model canonization

F Motzkus, L Weber… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
While rule-based attribution methods have proven useful for providing local explanations for
Deep Neural Networks, explaining modern and more varied network architectures yields …

Building reliable explanations of unreliable neural networks: Locally smoothing perspective of model interpretation

D Lim, H Lee, S Kim - … of the IEEE/CVF Conference on …, 2021 - openaccess.thecvf.com
We present a novel method for reliably explaining the predictions of neural networks. We
consider an explanation reliable if it identifies input features relevant to the model output by …