Interpretable deep learning: Interpretation, interpretability, trustworthiness, and beyond

X Li, H Xiong, X Li, X Wu, X Zhang, J Liu, J Bian… - … and Information Systems, 2022 - Springer
Deep neural networks have been well-known for their superb handling of various machine
learning and artificial intelligence tasks. However, due to their over-parameterized black-box …

[HTML][HTML] Perturbation-based methods for explaining deep neural networks: A survey

M Ivanovs, R Kadikis, K Ozols - Pattern Recognition Letters, 2021 - Elsevier
Deep neural networks (DNNs) have achieved state-of-the-art results in a broad range of
tasks, in particular the ones dealing with the perceptual data. However, full-scale application …

Acquisition of chess knowledge in alphazero

T McGrath, A Kapishnikov, N Tomašev… - Proceedings of the …, 2022 - National Acad Sciences
We analyze the knowledge acquired by AlphaZero, a neural network engine that learns
chess solely by playing against itself yet becomes capable of outperforming human chess …

Explainability in deep reinforcement learning

A Heuillet, F Couthouis, N Díaz-Rodríguez - Knowledge-Based Systems, 2021 - Elsevier
A large set of the explainable Artificial Intelligence (XAI) literature is emerging on feature
relevance techniques to explain a deep neural network (DNN) output or explaining models …

Unmasking Clever Hans predictors and assessing what machines really learn

S Lapuschkin, S Wäldchen, A Binder… - Nature …, 2019 - nature.com
Current learning machines have successfully solved hard application problems, reaching
high accuracy and displaying seemingly intelligent behavior. Here we apply recent …

Explainable ai and reinforcement learning—a systematic review of current approaches and trends

L Wells, T Bednarz - Frontiers in artificial intelligence, 2021 - frontiersin.org
Research into Explainable Artificial Intelligence (XAI) has been increasing in recent years as
a response to the need for increased transparency and trust in AI. This is particularly …

Explainable deep reinforcement learning: state of the art and challenges

GA Vouros - ACM Computing Surveys, 2022 - dl.acm.org
Interpretability, explainability, and transparency are key issues to introducing artificial
intelligence methods in many critical domains. This is important due to ethical concerns and …

Explainable reinforcement learning: A survey and comparative review

S Milani, N Topin, M Veloso, F Fang - ACM Computing Surveys, 2024 - dl.acm.org
Explainable reinforcement learning (XRL) is an emerging subfield of explainable machine
learning that has attracted considerable attention in recent years. The goal of XRL is to …

Scalable agent alignment via reward modeling: a research direction

J Leike, D Krueger, T Everitt, M Martic, V Maini… - arXiv preprint arXiv …, 2018 - arxiv.org
One obstacle to applying reinforcement learning algorithms to real-world problems is the
lack of suitable reward functions. Designing such reward functions is difficult in part because …

Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations

D Brown, W Goo, P Nagarajan… - … conference on machine …, 2019 - proceedings.mlr.press
A critical flaw of existing inverse reinforcement learning (IRL) methods is their inability to
significantly outperform the demonstrator. This is because IRL typically seeks a reward …