Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

Typology of risks of generative text-to-image models

C Bird, E Ungless, A Kasirzadeh - Proceedings of the 2023 AAAI/ACM …, 2023 - dl.acm.org
This paper investigates the direct risks and harms associated with modern text-to-image
generative models, such as DALL-E and Midjourney, through a comprehensive literature …

Bridging the gap: A survey on integrating (human) feedback for natural language generation

P Fernandes, A Madaan, E Liu, A Farinhas… - Transactions of the …, 2023 - direct.mit.edu
Natural language generation has witnessed significant advancements due to the training of
large language models on vast internet-scale datasets. Despite these advancements, there …

Crowdworksheets: Accounting for individual and collective identities underlying crowdsourced dataset annotation

M Díaz, I Kivlichan, R Rosen, D Baker… - Proceedings of the …, 2022 - dl.acm.org
Human annotated data plays a crucial role in machine learning (ML) research and
development. However, the ethical considerations around the processes and decisions that …

Ethics sheet for automatic emotion recognition and sentiment analysis

SM Mohammad - Computational Linguistics, 2022 - direct.mit.edu
The importance and pervasiveness of emotions in our lives makes affective computing a
tremendously important and vibrant line of work. Systems for automatic emotion recognition …

ConvAbuse: Data, analysis, and benchmarks for nuanced abuse detection in conversational AI

AC Curry, G Abercrombie, V Rieser - arXiv preprint arXiv:2109.09483, 2021 - arxiv.org
We present the first English corpus study on abusive language towards three conversational
AI systems gathered" in the wild": an open-domain social bot, a rule-based chatbot, and a …

Whose ground truth? accounting for individual and collective identities underlying dataset annotation

E Denton, M Díaz, I Kivlichan, V Prabhakaran… - arXiv preprint arXiv …, 2021 - arxiv.org
Human annotations play a crucial role in machine learning (ML) research and development.
However, the ethical considerations around the processes and decisions that go into …

Potato: The portable text annotation tool

J Pei, A Ananthasubramaniam, X Wang, N Zhou… - arXiv preprint arXiv …, 2022 - arxiv.org
We present POTATO, the Portable text annotation tool, a free, fully open-sourced annotation
system that 1) supports labeling many types of text and multimodal data; 2) offers easy-to …

Bugs in the data: How ImageNet misrepresents biodiversity

AS Luccioni, D Rolnick - Proceedings of the AAAI Conference on …, 2023 - ojs.aaai.org
ImageNet-1k is a dataset often used for benchmarking machine learning (ML) models and
evaluating tasks such as image recognition and object detection. Wild animals make up …

Just What do You Think You're Doing, Dave?'A Checklist for Responsible Data Use in NLP

A Rogers, T Baldwin, K Leins - arXiv preprint arXiv:2109.06598, 2021 - arxiv.org
A key part of the NLP ethics movement is responsible use of data, but exactly what that
means or how it can be best achieved remain unclear. This position paper discusses the …