Causal proxy models for concept-based model explanations

Z Wu, A Geiger, T Icard, C Potts… - Advances in Neural …, 2024 - proceedings.neurips.cc

Obtaining human-interpretable explanations of large, general-purpose language models is
an urgent goal for AI safety. However, it is just as important that our interpretability methods …

被引用次数：64 相关文章所有 6 个版本

[PDF] arxiv.org

Bridging causal discovery and large language models: A comprehensive survey of integrative approaches and future directions

G Wan, Y Wu, M Hu, Z Chu, S Li - arXiv preprint arXiv:2402.11068, 2024 - arxiv.org

Causal discovery (CD) and Large Language Models (LLMs) represent two emerging fields
of study with significant implications for artificial intelligence. Despite their distinct origins …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Bridging the human-ai knowledge gap: Concept discovery and transfer in alphazero

L Schut, N Tomasev, T McGrath, D Hassabis… - arXiv preprint arXiv …, 2023 - arxiv.org

Artificial Intelligence (AI) systems have made remarkable progress, attaining super-human
performance across various domains. This presents us with an opportunity to further human …

被引用次数：16 相关文章所有 2 个版本

[PDF] neurips.cc

Causal-structure driven augmentations for text ood generalization

A Feder, Y Wald, C Shi, S Saria… - Advances in Neural …, 2024 - proceedings.neurips.cc

The reliance of text classifiers on spurious correlations can lead to poor generalization at
deployment, raising concerns about their use in safety-critical domains such as healthcare …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

A primer on the inner workings of transformer-based language models

J Ferrando, G Sarti, A Bisazza… - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid progress of research aimed at interpreting the inner workings of advanced
language models has highlighted a need for contextualizing the insights gained from years …

被引用次数：18 相关文章所有 4 个版本

[PDF] arxiv.org

Concept-based explainable artificial intelligence: A survey

E Poeta, G Ciravegna, E Pastor, T Cerquitelli… - arXiv preprint arXiv …, 2023 - arxiv.org

The field of explainable artificial intelligence emerged in response to the growing need for
more transparent and reliable models. However, using raw features to provide explanations …

被引用次数：18 相关文章所有 2 个版本

[PDF] arxiv.org

Faithful explanations of black-box nlp models using llm-generated counterfactuals

Y Gat, N Calderon, A Feder, A Chapanin… - arXiv preprint arXiv …, 2023 - arxiv.org

Causal explanations of the predictions of NLP systems are essential to ensure safety and
establish trust. Yet, existing methods often fall short of explaining model predictions …

被引用次数：15 相关文章所有 4 个版本

[PDF] arxiv.org

ScoNe: Benchmarking negation reasoning in language models with fine-tuning and in-context learning

JS She, C Potts, SR Bowman, A Geiger - arXiv preprint arXiv:2305.19426, 2023 - arxiv.org

A number of recent benchmarks seek to assess how well models handle natural language
negation. However, these benchmarks lack the controlled example paradigms that would …

被引用次数：9 相关文章所有 5 个版本

[PDF] arxiv.org

Mission: Impossible language models

J Kallini, I Papadimitriou, R Futrell, K Mahowald… - arXiv preprint arXiv …, 2024 - arxiv.org

Chomsky and others have very directly claimed that large language models (LLMs) are
equally capable of learning languages that are possible and impossible for humans to learn …

被引用次数：10 相关文章所有 3 个版本

[PDF] hal.science

A glitch in the matrix? locating and detecting language model grounding with fakepedia

G Monea, M Peyrard, M Josifoski, V Chaudhary… - ACL 2024, 2024 - hal.science

Large language models (LLMs) have an impressive ability to draw on novel information
supplied in their context. Yet the mechanisms underlying this contextual grounding remain …

被引用次数：6 相关文章所有 3 个版本

高级搜索

QQ 群