Inducing causal structure for interpretable neural networks

A Chaddad, J Peng, J Xu, A Bouridane - Sensors, 2023 - mdpi.com

Artificial intelligence (AI) with deep learning models has been widely applied in numerous
domains, including medical imaging and healthcare tasks. In the medical field, any judgment …

被引用次数：197 相关文章所有 8 个版本

[PDF] arxiv.org

Towards trustworthy and aligned machine learning: A data-centric survey with causality perspectives

H Liu, M Chaudhary, H Wang - arXiv preprint arXiv:2307.16851, 2023 - arxiv.org

The trustworthiness of machine learning has emerged as a critical topic in the field,
encompassing various applications and research areas such as robustness, security …

被引用次数：21 相关文章所有 3 个版本

[PDF] neurips.cc

Leace: Perfect linear concept erasure in closed form

N Belrose, D Schneider-Joseph… - Advances in …, 2024 - proceedings.neurips.cc

Abstract Concept erasure aims to remove specified features from a representation. It can
improve fairness (eg preventing a classifier from using gender or race) and interpretability …

被引用次数：67 相关文章所有 5 个版本

[PDF] neurips.cc

Interpretability at scale: Identifying causal mechanisms in alpaca

Z Wu, A Geiger, T Icard, C Potts… - Advances in Neural …, 2024 - proceedings.neurips.cc

Obtaining human-interpretable explanations of large, general-purpose language models is
an urgent goal for AI safety. However, it is just as important that our interpretability methods …

被引用次数：64 相关文章所有 6 个版本

[PDF] arxiv.org

Reasoning or reciting? exploring the capabilities and limitations of language models through counterfactual tasks

Z Wu, L Qiu, A Ross, E Akyürek, B Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

The impressive performance of recent language models across a wide range of tasks
suggests that they possess a degree of abstract reasoning skills. Are these skills general …

被引用次数：99 相关文章所有 4 个版本

[PDF] openreview.net

Toward transparent ai: A survey on interpreting the inner structures of deep neural networks

T Räuker, A Ho, S Casper… - 2023 ieee conference …, 2023 - ieeexplore.ieee.org

The last decade of machine learning has seen drastic increases in scale and capabilities.
Deep neural networks (DNNs) are increasingly being deployed in the real world. However …

被引用次数：142 相关文章所有 5 个版本

[PDF] arxiv.org

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org

This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

被引用次数：45 相关文章所有 3 个版本

[PDF] mit.edu

Towards faithful model explanation in nlp: A survey

Q Lyu, M Apidianaki, C Callison-Burch - Computational Linguistics, 2024 - direct.mit.edu

End-to-end neural Natural Language Processing (NLP) models are notoriously difficult to
understand. This has given rise to numerous efforts towards model explainability in recent …

被引用次数：69 相关文章所有 4 个版本

[PDF] arxiv.org

Towards best practices of activation patching in language models: Metrics and methods

F Zhang, N Nanda - arXiv preprint arXiv:2309.16042, 2023 - arxiv.org

Mechanistic interpretability seeks to understand the internal mechanisms of machine
learning models, where localization--identifying the important model components--is a key …

被引用次数：38 相关文章所有 4 个版本

[PDF] mlr.press

Causal proxy models for concept-based model explanations

Z Wu, K D'Oosterlinck, A Geiger… - … on machine learning, 2023 - proceedings.mlr.press

Explainability methods for NLP systems encounter a version of the fundamental problem of
causal inference: for a given ground-truth input text, we never truly observe the …

被引用次数：29 相关文章所有 9 个版本

高级搜索

QQ 群