Teach me to explain: A review of datasets for explainable natural language processing

S Wiegreffe, A Marasović - arXiv preprint arXiv:2102.12060, 2021 - arxiv.org
Explainable NLP (ExNLP) has increasingly focused on collecting human-annotated textual
explanations. These explanations are used downstream in three ways: as data …

Bridging the gap: A survey on integrating (human) feedback for natural language generation

P Fernandes, A Madaan, E Liu, A Farinhas… - Transactions of the …, 2023 - direct.mit.edu
Natural language generation has witnessed significant advancements due to the training of
large language models on vast internet-scale datasets. Despite these advancements, there …

The'Problem'of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

B Plank - arXiv preprint arXiv:2211.02570, 2022 - arxiv.org
Human variation in labeling is often considered noise. Annotation projects for machine
learning (ML) aim at minimizing human label variation, with the assumption to maximize …

Learning from disagreement: A survey

AN Uma, T Fornaciari, D Hovy, S Paun, B Plank… - Journal of Artificial …, 2021 - jair.org
Abstract Many tasks in Natural Language Processing (NLP) and Computer Vision (CV) offer
evidence that humans disagree, from objective tasks such as part-of-speech tagging to more …

We're afraid language models aren't modeling ambiguity

A Liu, Z Wu, J Michael, A Suhr, P West, A Koller… - arXiv preprint arXiv …, 2023 - arxiv.org
Ambiguity is an intrinsic feature of natural language. Managing ambiguity is a key part of
human language understanding, allowing us to anticipate misunderstanding as …

Investigating reasons for disagreement in natural language inference

NJ Jiang, MC Marneffe - Transactions of the Association for …, 2022 - direct.mit.edu
We investigate how disagreement in natural language inference (NLI) annotation arises. We
developed a taxonomy of disagreement sources with 10 categories spanning 3 high-level …

Culturally aware natural language inference

J Huang, D Yang - Findings of the Association for Computational …, 2023 - aclanthology.org
Humans produce and consume language in a particular cultural context, which includes
knowledge about specific norms and practices. A listener's awareness of the cultural context …

I like fish, especially dolphins: Addressing contradictions in dialogue modeling

Y Nie, M Williamson, M Bansal, D Kiela… - arXiv preprint arXiv …, 2020 - arxiv.org
To quantify how well natural language understanding models can capture consistency in a
general conversation, we introduce the DialoguE COntradiction DEtection task (DECODE) …

Metacognitive prompting improves understanding in large language models

Y Wang, Y Zhao - arXiv preprint arXiv:2308.05342, 2023 - arxiv.org
In Large Language Models (LLMs), there have been consistent advancements in task-
specific performance, largely influenced by effective prompt design. While recent research …

Stop measuring calibration when humans disagree

J Baan, W Aziz, B Plank, R Fernandez - arXiv preprint arXiv:2210.16133, 2022 - arxiv.org
Calibration is a popular framework to evaluate whether a classifier knows when it does not
know-ie, its predictive probabilities are a good indication of how likely a prediction is to be …