A survey on fairness in large language models

Y Li, M Du, R Song, X Wang, Y Wang - arXiv preprint arXiv:2308.10149, 2023 - arxiv.org
Large language models (LLMs) have shown powerful performance and development
prospect and are widely deployed in the real world. However, LLMs can capture social …

Fairness in deep learning: A survey on vision and language research

O Parraga, MD More, CM Oliveira, NS Gavenski… - ACM Computing …, 2023 - dl.acm.org
Despite being responsible for state-of-the-art results in several computer vision and natural
language processing tasks, neural networks have faced harsh criticism due to some of their …

Leace: Perfect linear concept erasure in closed form

N Belrose, D Schneider-Joseph… - Advances in …, 2024 - proceedings.neurips.cc
Abstract Concept erasure aims to remove specified features from a representation. It can
improve fairness (eg preventing a classifier from using gender or race) and interpretability …

Harms of gender exclusivity and challenges in non-binary representation in language technologies

S Dev, M Monajatipoor, A Ovalle… - arXiv preprint arXiv …, 2021 - arxiv.org
Gender is widely discussed in the context of language tasks and when examining the
stereotypes propagated by language models. However, current discussions primarily treat …

A survey on bias in deep NLP

I Garrido-Muñoz, A Montejo-Ráez… - Applied Sciences, 2021 - mdpi.com
Deep neural networks are hegemonic approaches to many machine learning areas,
including natural language processing (NLP). Thanks to the availability of large corpora …

Linear adversarial concept erasure

S Ravfogel, M Twiton, Y Goldberg… - … on Machine Learning, 2022 - proceedings.mlr.press
Modern neural models trained on textual data rely on pre-trained representations that
emerge without direct supervision. As these representations are increasingly being used in …

Having beer after prayer? measuring cultural bias in large language models

T Naous, MJ Ryan, A Ritter, W Xu - arXiv preprint arXiv:2305.14456, 2023 - arxiv.org
As the reach of large language models (LMs) expands globally, their ability to cater to
diverse cultural contexts becomes crucial. Despite advancements in multilingual …

On measures of biases and harms in NLP

S Dev, E Sheng, J Zhao, A Amstutz, J Sun… - arXiv preprint arXiv …, 2021 - arxiv.org
Recent studies show that Natural Language Processing (NLP) technologies propagate
societal biases about demographic groups associated with attributes such as gender, race …

Risk taxonomy, mitigation, and assessment benchmarks of large language model systems

T Cui, Y Wang, C Fu, Y Xiao, S Li, X Deng, Y Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have strong capabilities in solving diverse natural language
processing tasks. However, the safety and security issues of LLM systems have become the …

MISGENDERED: Limits of large language models in understanding pronouns

T Hossain, S Dev, S Singh - arXiv preprint arXiv:2306.03950, 2023 - arxiv.org
Content Warning: This paper contains examples of misgendering and erasure that could be
offensive and potentially triggering. Gender bias in language technologies has been widely …