Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

Llama 2: Open foundation and fine-tuned chat models

H Touvron, L Martin, K Stone, P Albert… - arXiv preprint arXiv …, 2023 - arxiv.org
In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large
language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine …

Scaling instruction-finetuned language models

HW Chung, L Hou, S Longpre, B Zoph, Y Tay… - Journal of Machine …, 2024 - jmlr.org
Finetuning language models on a collection of datasets phrased as instructions has been
shown to improve model performance and generalization to unseen tasks. In this paper we …

Evaluating the social impact of generative ai systems in systems and society

I Solaiman, Z Talat, W Agnew, L Ahmad… - arXiv preprint arXiv …, 2023 - arxiv.org
Generative AI systems across modalities, ranging from text, image, audio, and video, have
broad social impacts, but there exists no official standard for means of evaluating those …

Sociotechnical harms of algorithmic systems: Scoping a taxonomy for harm reduction

R Shelby, S Rismani, K Henne, AJ Moon… - Proceedings of the …, 2023 - dl.acm.org
Understanding the landscape of potential harms from algorithmic systems enables
practitioners to better anticipate consequences of the systems they build. It also supports the …

Having beer after prayer? measuring cultural bias in large language models

T Naous, MJ Ryan, A Ritter, W Xu - arXiv preprint arXiv:2305.14456, 2023 - arxiv.org
As the reach of large language models (LMs) expands globally, their ability to cater to
diverse cultural contexts becomes crucial. Despite advancements in multilingual …

[PDF][PDF] Survey on sociodemographic bias in natural language processing

V Gupta, PN Venkit, S Wilson… - arXiv preprint arXiv …, 2023 - researchgate.net
Deep neural networks often learn unintended bias during training, which might have harmful
effects when deployed in realworld settings. This work surveys 214 papers related to …

Fairness in language models beyond English: Gaps and challenges

K Ramesh, S Sitaram, M Choudhury - arXiv preprint arXiv:2302.12578, 2023 - arxiv.org
With language models becoming increasingly ubiquitous, it has become essential to
address their inequitable treatment of diverse demographic groups and factors. Most …

This prompt is measuring< mask>: evaluating bias evaluation in language models

S Goldfarb-Tarrant, E Ungless, E Balkir… - arXiv preprint arXiv …, 2023 - arxiv.org
Bias research in NLP seeks to analyse models for social biases, thus helping NLP
practitioners uncover, measure, and mitigate social harms. We analyse the body of work that …

Building socio-culturally inclusive stereotype resources with community engagement

S Dev, J Goyal, D Tewari, S Dave… - Advances in Neural …, 2024 - proceedings.neurips.cc
With rapid development and deployment of generative language models in global settings,
there is an urgent need to also scale our measurements of harm, not just in the number and …