A rigorous study of integrated gradients method and extensions to internal neuron attributions

C Wang, X Liu, Y Yue, X Tang, T Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

This survey addresses the crucial issue of factuality in Large Language Models (LLMs). As
LLMs find applications across diverse domains, the reliability and accuracy of their outputs …

被引用次数：173 相关文章所有 2 个版本

[PDF] arxiv.org

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

被引用次数：214 相关文章所有 3 个版本

[PDF] acm.org

Explainability for large language models: A survey

H Zhao, H Chen, F Yang, N Liu, H Deng, H Cai… - ACM Transactions on …, 2024 - dl.acm.org

Large language models (LLMs) have demonstrated impressive capabilities in natural
language processing. However, their internal mechanisms are still unclear and this lack of …

被引用次数：400 相关文章所有 5 个版本

[PDF] openreview.net

Toward transparent ai: A survey on interpreting the inner structures of deep neural networks

T Räuker, A Ho, S Casper… - 2023 ieee conference …, 2023 - ieeexplore.ieee.org

The last decade of machine learning has seen drastic increases in scale and capabilities.
Deep neural networks (DNNs) are increasingly being deployed in the real world. However …

被引用次数：185 相关文章所有 5 个版本

[PDF] aaai.org

Journey to the center of the knowledge neurons: Discoveries of language-independent knowledge neurons and degenerate knowledge neurons

Y Chen, P Cao, Y Chen, K Liu, J Zhao - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Pre-trained language models (PLMs) contain vast amounts of factual knowledge, but how
the knowledge is stored in the parameters remains unclear. This paper delves into the …

被引用次数：33 相关文章所有 3 个版本

[PDF] arxiv.org

Adversarial attacks and defenses in explainable artificial intelligence: A survey

H Baniecki, P Biecek - Information Fusion, 2024 - Elsevier

Explainable artificial intelligence (XAI) methods are portrayed as a remedy for debugging
and trusting statistical and deep learning models, as well as interpreting their predictions …

被引用次数：73 相关文章所有 5 个版本

[PDF] springer.com

Explainable generative ai (genxai): A survey, conceptualization, and research agenda

J Schneider - Artificial Intelligence Review, 2024 - Springer

Generative AI (GenAI) represents a shift from AI's ability to “recognize” to its ability to
“generate” solutions for a wide range of tasks. As generated solutions and applications grow …

被引用次数：15 相关文章所有 3 个版本

[PDF] arxiv.org

A comprehensive study of knowledge editing for large language models

N Zhang, Y Yao, B Tian, P Wang, S Deng… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have shown extraordinary capabilities in understanding
and generating text that closely mirrors human communication. However, a primary …

被引用次数：30 相关文章所有 2 个版本

[PDF] arxiv.org

Sequential Integrated Gradients: a simple but effective method for explaining language models

J Enguehard - arXiv preprint arXiv:2305.15853, 2023 - arxiv.org

Several explanation methods such as Integrated Gradients (IG) can be characterised as
path-based methods, as they rely on a straight line between the data and an uninformative …

被引用次数：37 相关文章所有 3 个版本

[PDF] neurips.cc

Stability guarantees for feature attributions with multiplicative smoothing

A Xue, R Alur, E Wong - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Explanation methods for machine learning models tend not to provide any formal
guarantees and may not reflect the underlying decision-making process. In this work, we …

被引用次数：3 相关文章所有 7 个版本

高级搜索

QQ 群