Explainability for large language models: A survey

H Zhao, H Chen, F Yang, N Liu, H Deng, H Cai… - ACM Transactions on …, 2024 - dl.acm.org
Large language models (LLMs) have demonstrated impressive capabilities in natural
language processing. However, their internal mechanisms are still unclear and this lack of …

Rationalization for explainable NLP: a survey

S Gurrapu, A Kulkarni, L Huang… - Frontiers in Artificial …, 2023 - frontiersin.org
Recent advances in deep learning have improved the performance of many Natural
Language Processing (NLP) tasks such as translation, question-answering, and text …

Ethics sheet for automatic emotion recognition and sentiment analysis

SM Mohammad - Computational Linguistics, 2022 - direct.mit.edu
The importance and pervasiveness of emotions in our lives makes affective computing a
tremendously important and vibrant line of work. Systems for automatic emotion recognition …

Unirex: A unified learning framework for language model rationale extraction

A Chan, M Sanjabi, L Mathias, L Tan… - International …, 2022 - proceedings.mlr.press
An extractive rationale explains a language model's (LM's) prediction on a given task
instance by highlighting the text inputs that most influenced the prediction. Ideally, rationale …

Explainable AI approaches in deep learning: Advancements, applications and challenges

MT Hosain, JR Jim, MF Mridha, MM Kabir - Computers and Electrical …, 2024 - Elsevier
Abstract Explainable Artificial Intelligence refers to developing artificial intelligence models
and systems that can provide clear, understandable, and transparent explanations for their …

DARE: disentanglement-augmented rationale extraction

L Yue, Q Liu, Y Du, Y An, L Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc
Rationale extraction can be considered as a straightforward method of improving the model
explainability, where rationales are a subsequence of the original inputs, and can be …

Necessity and sufficiency for explaining text classifiers: A case study in hate speech detection

E Balkir, I Nejadgholi, KC Fraser… - arXiv preprint arXiv …, 2022 - arxiv.org
We present a novel feature attribution method for explaining text classifiers, and analyze it in
the context of hate speech detection. Although feature attribution models usually provide a …

Follow the successful herd: Towards explanations for improved use and mental models of natural language systems

M Brachman, Q Pan, HJ Do, C Dugan… - Proceedings of the 28th …, 2023 - dl.acm.org
While natural language systems continue improving, they are still imperfect. If a user has a
better understanding of how a system works, they may be able to better accomplish their …

Exploring faithful rationale for multi-hop fact verification via salience-aware graph learning

J Si, Y Zhu, D Zhou - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
The opaqueness of the multi-hop fact verification model imposes imperative requirements
for explainability. One feasible way is to extract rationales, a subset of inputs, where the …

SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

T Xu, S Wu, S Diao, X Liu, X Wang, Y Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) often generate inaccurate or fabricated information and
generally fail to indicate their confidence, which limits their broader applications. Previous …