A survey on neural network interpretability

Y Zhang, P Tiňo, A Leonardis… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Along with the great success of deep neural networks, there is also growing concern about
their black-box nature. The interpretability issue affects people's trust on deep learning …

Analysis methods in neural language processing: A survey

Y Belinkov, J Glass - … of the Association for Computational Linguistics, 2019 - direct.mit.edu
The field of natural language processing has seen impressive progress in recent years, with
neural network models replacing many of the traditional systems. A plethora of new models …

Explainability for large language models: A survey

H Zhao, H Chen, F Yang, N Liu, H Deng, H Cai… - ACM Transactions on …, 2024 - dl.acm.org
Large language models (LLMs) have demonstrated impressive capabilities in natural
language processing. However, their internal mechanisms are still unclear and this lack of …

Transformer feed-forward layers are key-value memories

M Geva, R Schuster, J Berant, O Levy - arXiv preprint arXiv:2012.14913, 2020 - arxiv.org
Feed-forward layers constitute two-thirds of a transformer model's parameters, yet their role
in the network remains under-explored. We show that feed-forward layers in transformer …

How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model

M Hanna, O Liu, A Variengien - Advances in Neural …, 2024 - proceedings.neurips.cc
Pre-trained language models can be surprisingly adept at tasks they were not explicitly
trained on, but how they implement these capabilities is poorly understood. In this paper, we …

Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in AI

A Jacovi, A Marasović, T Miller… - Proceedings of the 2021 …, 2021 - dl.acm.org
Trust is a central component of the interaction between people and AI, in that'incorrect'levels
of trust may cause misuse, abuse or disuse of the technology. But what, precisely, is the …

Finding neurons in a haystack: Case studies with sparse probing

W Gurnee, N Nanda, M Pauly, K Harvey… - arXiv preprint arXiv …, 2023 - arxiv.org
Despite rapid adoption and deployment of large language models (LLMs), the internal
computations of these models remain opaque and poorly understood. In this work, we seek …

Neural machine translation: A review

F Stahlberg - Journal of Artificial Intelligence Research, 2020 - jair.org
The field of machine translation (MT), the automatic translation of written text from one
natural language into another, has experienced a major paradigm shift in recent years …

Natural language descriptions of deep visual features

E Hernandez, S Schwettmann, D Bau… - International …, 2021 - openreview.net
Some neurons in deep networks specialize in recognizing highly specific perceptual,
structural, or semantic features of inputs. In computer vision, techniques exist for identifying …

Head2toe: Utilizing intermediate representations for better transfer learning

U Evci, V Dumoulin, H Larochelle… - … on Machine Learning, 2022 - proceedings.mlr.press
Transfer-learning methods aim to improve performance in a data-scarce target domain using
a model pretrained on a data-rich source domain. A cost-efficient strategy, linear probing …