Probing classifiers: Promises, shortcomings, and advances

Y Belinkov - Computational Linguistics, 2022 - direct.mit.edu
Probing classifiers have emerged as one of the prominent methodologies for interpreting
and analyzing deep neural network models of natural language processing. The basic idea …

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Factual probing is [mask]: Learning vs. learning to recall

Z Zhong, D Friedman, D Chen - arXiv preprint arXiv:2104.05240, 2021 - arxiv.org
Petroni et al.(2019) demonstrated that it is possible to retrieve world facts from a pre-trained
language model by expressing them as cloze-style prompts and interpret the model's …

[PDF][PDF] 自然语言处理中的探针可解释方法综述

鞠天杰, 刘功申, 张倬胜, 张茹 - 计算机学报, 2024 - cjc.ict.ac.cn
摘要随着大规模预训练模型的广泛应用, 自然语言处理的多个领域(如文本分类和机器翻译)
取得了长足的发展. 然而, 受限于预训练模型的“黑盒” 特性, 其内部的决策模式以及编码的知识 …

Emergent structures and training dynamics in large language models

R Teehan, M Clinciu, O Serikov… - … # 5--Workshop on …, 2022 - aclanthology.org
Large language models have achieved success on a number of downstream tasks,
particularly in a few and zero-shot manner. As a consequence, researchers have been …

Interventional probing in high dimensions: An nli case study

J Rozanova, M Valentino, L Cordeiro… - arXiv preprint arXiv …, 2023 - arxiv.org
Probing strategies have been shown to detect the presence of various linguistic features in
large language models; in particular, semantic features intermediate to the" natural logic" …

Exploring the role of BERT token representations to explain sentence probing results

H Mohebbi, A Modarressi, MT Pilehvar - arXiv preprint arXiv:2104.01477, 2021 - arxiv.org
Several studies have been carried out on revealing linguistic features captured by BERT.
This is usually achieved by training a diagnostic classifier on the representations obtained …

Operationalising representation in natural language processing

J Harding - 2023 - journals.uchicago.edu
Neural models achieve high performance on a variety of natural language processing (NLP)
benchmark tasks. How models perform these tasks, though, is notoriously poorly …

Predicting fine-tuning performance with probing

Z Zhu, S Shahtalebi, F Rudzicz - arXiv preprint arXiv:2210.07352, 2022 - arxiv.org
Large NLP models have recently shown impressive performance in language
understanding tasks, typically evaluated by their fine-tuned performance. Alternatively …

Pixology: Probing the Linguistic and Visual Capabilities of Pixel-based Language Models

K Tatariya, V Araujo, T Bauwens… - arXiv preprint arXiv …, 2024 - arxiv.org
Pixel-based language models have emerged as a compelling alternative to subword-based
language modelling, particularly because they can represent virtually any script. PIXEL, a …