Evaluating large language models: A comprehensive survey

Z Guo, R Jin, C Liu, Y Huang, D Shi, L Yu, Y Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have demonstrated remarkable capabilities across a broad
spectrum of tasks. They have attracted significant attention and been deployed in numerous …

Evaluating the social impact of generative ai systems in systems and society

I Solaiman, Z Talat, W Agnew, L Ahmad… - arXiv preprint arXiv …, 2023 - arxiv.org
Generative AI systems across modalities, ranging from text, image, audio, and video, have
broad social impacts, but there exists no official standard for means of evaluating those …

[PDF][PDF] Survey on sociodemographic bias in natural language processing

V Gupta, PN Venkit, S Wilson… - arXiv preprint arXiv …, 2023 - researchgate.net
Deep neural networks often learn unintended bias during training, which might have harmful
effects when deployed in realworld settings. This work surveys 214 papers related to …

Re-contextualizing fairness in NLP: The case of India

S Bhatt, S Dev, P Talukdar, S Dave… - arXiv preprint arXiv …, 2022 - arxiv.org
Recent research has revealed undesirable biases in NLP data and models. However, these
efforts focus on social disparities in West, and are not directly portable to other geo-cultural …

Building socio-culturally inclusive stereotype resources with community engagement

S Dev, J Goyal, D Tewari, S Dave… - Advances in Neural …, 2024 - proceedings.neurips.cc
With rapid development and deployment of generative language models in global settings,
there is an urgent need to also scale our measurements of harm, not just in the number and …

Bias and fairness in large language models: A survey

IO Gallegos, RA Rossi, J Barrow, MM Tanjim… - Computational …, 2024 - direct.mit.edu
Rapid advancements of large language models (LLMs) have enabled the processing,
understanding, and generation of human-like text, with increasing integration into systems …

Which Stereotypes Are Moderated and Under-Moderated in Search Engine Autocompletion?

A Leidinger, R Rogers - Proceedings of the 2023 ACM Conference on …, 2023 - dl.acm.org
Warning: This paper contains content that may be offensive or upsetting. Language
technologies that perpetuate stereotypes actively cement social hierarchies. This study …

[PDF][PDF] Undesirable biases in NLP: Averting a crisis of measurement

O Van der Wal, D Bachmann, A Leidinger… - arXiv preprint arXiv …, 2022 - pure.uva.nl
Abstract As Large Language Models and Natural Language Processing (NLP) technology
rapidly develops and spreads into daily life, it becomes crucial to anticipate how its use …

Healthy immigrant effect or under-detection? Examining undiagnosed and unrecognized late-life depression for racialized immigrants and nonimmigrants in Canada

S Lin - The Journals of Gerontology: Series B, 2024 - academic.oup.com
Abstract Objectives Immigrants to Canada tend to have a lower incidence of diagnosed
depression than nonimmigrants. One theory suggests that this “healthy immigrant effect …

Undesirable biases in NLP: Addressing challenges of measurement

O Van der Wal, D Bachmann, A Leidinger… - Journal of Artificial …, 2024 - jair.org
Abstract As Large Language Models and Natural Language Processing (NLP) technology
rapidly develop and spread into daily life, it becomes crucial to anticipate how their use …