Survey on factuality in large language models: Knowledge, retrieval and domain-specificity

C Wang, X Liu, Y Yue, X Tang, T Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
This survey addresses the crucial issue of factuality in Large Language Models (LLMs). As
LLMs find applications across diverse domains, the reliability and accuracy of their outputs …

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

Explainability for large language models: A survey

H Zhao, H Chen, F Yang, N Liu, H Deng, H Cai… - ACM Transactions on …, 2024 - dl.acm.org
Large language models (LLMs) have demonstrated impressive capabilities in natural
language processing. However, their internal mechanisms are still unclear and this lack of …

Toward transparent ai: A survey on interpreting the inner structures of deep neural networks

T Räuker, A Ho, S Casper… - 2023 ieee conference …, 2023 - ieeexplore.ieee.org
The last decade of machine learning has seen drastic increases in scale and capabilities.
Deep neural networks (DNNs) are increasingly being deployed in the real world. However …

Journey to the center of the knowledge neurons: Discoveries of language-independent knowledge neurons and degenerate knowledge neurons

Y Chen, P Cao, Y Chen, K Liu, J Zhao - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Pre-trained language models (PLMs) contain vast amounts of factual knowledge, but how
the knowledge is stored in the parameters remains unclear. This paper delves into the …

Adversarial attacks and defenses in explainable artificial intelligence: A survey

H Baniecki, P Biecek - Information Fusion, 2024 - Elsevier
Explainable artificial intelligence (XAI) methods are portrayed as a remedy for debugging
and trusting statistical and deep learning models, as well as interpreting their predictions …

Explainable generative ai (genxai): A survey, conceptualization, and research agenda

J Schneider - Artificial Intelligence Review, 2024 - Springer
Generative AI (GenAI) represents a shift from AI's ability to “recognize” to its ability to
“generate” solutions for a wide range of tasks. As generated solutions and applications grow …

A comprehensive study of knowledge editing for large language models

N Zhang, Y Yao, B Tian, P Wang, S Deng… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have shown extraordinary capabilities in understanding
and generating text that closely mirrors human communication. However, a primary …

Sequential Integrated Gradients: a simple but effective method for explaining language models

J Enguehard - arXiv preprint arXiv:2305.15853, 2023 - arxiv.org
Several explanation methods such as Integrated Gradients (IG) can be characterised as
path-based methods, as they rely on a straight line between the data and an uninformative …

Stability guarantees for feature attributions with multiplicative smoothing

A Xue, R Alur, E Wong - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Explanation methods for machine learning models tend not to provide any formal
guarantees and may not reflect the underlying decision-making process. In this work, we …