- 学术资源搜索

Does localization inform editing? surprising differences in causality-based localization vs. knowledge editing in language models

P Hase, M Bansal, B Kim… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Language models learn a great quantity of factual information during pretraining,
and recent work localizes this information to specific model weights like mid-layer MLP …

被引用次数：112 相关文章所有 6 个版本

[PDF] thecvf.com

Craft: Concept recursive activation factorization for explainability

T Fel, A Picard, L Bethune, T Boissin… - Proceedings of the …, 2023 - openaccess.thecvf.com

Attribution methods are a popular class of explainability methods that use heatmaps to
depict the most important areas of an image that drive a model decision. Nevertheless …

被引用次数：92 相关文章所有 18 个版本

[PDF] arxiv.org

Concept embedding analysis: A review

G Schwalbe - arXiv preprint arXiv:2203.13909, 2022 - arxiv.org

Deep neural networks (DNNs) have found their way into many applications with potential
impact on the safety, security, and fairness of human-machine-systems. Such require basic …

被引用次数：28 相关文章所有 2 个版本

[PDF] thecvf.com

Interpretable image recognition by constructing transparent embedding space

J Wang, H Liu, X Wang, L Jing - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Humans usually explain their reasoning (eg classification) by dissecting the image and
pointing out the evidence from these parts to the concepts in their minds. Inspired by this …

被引用次数：117 相关文章所有 3 个版本

[PDF] mlr.press

Text-to-concept (and back) via cross-model alignment

M Moayeri, K Rezaei, M Sanjabi… - … on Machine Learning, 2023 - proceedings.mlr.press

We observe that the mapping between an image's representation in one model to its
representation in another can be learned surprisingly well with just a linear layer, even …

被引用次数：43 相关文章所有 8 个版本

[PDF] neurips.cc

A holistic approach to unifying automatic concept extraction and concept importance estimation

T Fel, V Boutin, L Béthune, R Cadène… - Advances in …, 2024 - proceedings.neurips.cc

In recent years, concept-based approaches have emerged as some of the most promising
explainability methods to help us interpret the decisions of Artificial Neural Networks (ANNs) …

被引用次数：35 相关文章所有 11 个版本

[PDF] arxiv.org

Discover-then-name: Task-agnostic concept bottlenecks via automated concept discovery

S Rao, S Mahajan, M Böhle, B Schiele - European Conference on …, 2024 - Springer

Abstract Concept Bottleneck Models (CBMs) have recently been proposed to address the
'black-box'problem of deep neural networks, by first mapping images to a human …

被引用次数：6 相关文章所有 7 个版本

[PDF] github.io

SoK: Explainable machine learning in adversarial environments

M Noppel, C Wressnegger - 2024 IEEE Symposium on Security …, 2024 - ieeexplore.ieee.org

Modern deep learning methods have long been considered black boxes due to the lack of
insights into their decision-making process. However, recent advances in explainable …

被引用次数：14 相关文章所有 8 个版本

[PDF] neurips.cc

Scalable interpretability via polynomials

A Dubey, F Radenovic… - Advances in neural …, 2022 - proceedings.neurips.cc

Abstract Generalized Additive Models (GAMs) have quickly become the leading choice for
interpretable machine learning. However, unlike uninterpretable methods such as DNNs …

被引用次数：39 相关文章所有 9 个版本

[PDF] ieee.org

Disentangled explanations of neural network predictions by finding relevant subspaces

P Chormai, J Herrmann, KR Müller… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Explainable AI aims to overcome the black-box nature of complex ML models like neural
networks by generating explanations for their predictions. Explanations often take the form of …

被引用次数：21 相关文章所有 8 个版本

高级搜索

QQ 群