Whose opinions do language models reflect?

S Santurkar, E Durmus, F Ladhak… - International …, 2023 - proceedings.mlr.press
Abstract Language models (LMs) are increasingly being used in open-ended contexts,
where the opinions they reflect in response to subjective queries can have a profound …

Auditing large language models: a three-layered approach

J Mökander, J Schuett, HR Kirk, L Floridi - AI and Ethics, 2024 - Springer
Large language models (LLMs) represent a major advance in artificial intelligence (AI)
research. However, the widespread use of LLMs is also coupled with significant ethical and …

From pretraining data to language models to downstream tasks: Tracking the trails of political biases leading to unfair NLP models

S Feng, CY Park, Y Liu, Y Tsvetkov - arXiv preprint arXiv:2305.08283, 2023 - arxiv.org
Language models (LMs) are pretrained on diverse data sources, including news, discussion
forums, books, and online encyclopedias. A significant portion of this data includes opinions …

A survey of language model confidence estimation and calibration

J Geng, F Cai, Y Wang, H Koeppl, P Nakov… - arXiv preprint arXiv …, 2023 - arxiv.org
Language models (LMs) have demonstrated remarkable capabilities across a wide range of
tasks in various domains. Despite their impressive performance, the reliability of their output …

Toxigen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection

T Hartvigsen, S Gabriel, H Palangi, M Sap… - arXiv preprint arXiv …, 2022 - arxiv.org
Toxic language detection systems often falsely flag text that contains minority group
mentions as toxic, as those groups are often the targets of online hate. Such over-reliance …

Towards measuring the representation of subjective global opinions in language models

E Durmus, K Nyugen, TI Liao, N Schiefer… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) may not equitably represent diverse global perspectives on
societal issues. In this paper, we develop a quantitative framework to evaluate whose …

Evaluating the social impact of generative ai systems in systems and society

I Solaiman, Z Talat, W Agnew, L Ahmad… - arXiv preprint arXiv …, 2023 - arxiv.org
Generative AI systems across modalities, ranging from text, image, audio, and video, have
broad social impacts, but there exists no official standard for means of evaluating those …

Bridging the gap: A survey on integrating (human) feedback for natural language generation

P Fernandes, A Madaan, E Liu, A Farinhas… - Transactions of the …, 2023 - direct.mit.edu
Natural language generation has witnessed significant advancements due to the training of
large language models on vast internet-scale datasets. Despite these advancements, there …

The'Problem'of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

B Plank - arXiv preprint arXiv:2211.02570, 2022 - arxiv.org
Human variation in labeling is often considered noise. Annotation projects for machine
learning (ML) aim at minimizing human label variation, with the assumption to maximize …

NLPositionality: Characterizing design biases of datasets and models

S Santy, JT Liang, RL Bras, K Reinecke… - arXiv preprint arXiv …, 2023 - arxiv.org
Design biases in NLP systems, such as performance differences for different populations,
often stem from their creator's positionality, ie, views and lived experiences shaped by …