Fairness in large language models: A taxonomic survey

Z Chu, Z Wang, W Zhang - ACM SIGKDD explorations newsletter, 2024 - dl.acm.org
Large Language Models (LLMs) have demonstrated remarkable success across various
domains. However, despite their promising performance in numerous real-world …

Personalized language modeling from personalized human feedback

X Li, R Zhou, ZC Lipton, L Leqi - arXiv preprint arXiv:2402.05133, 2024 - arxiv.org
Personalized large language models (LLMs) are designed to tailor responses to individual
user preferences. While Reinforcement Learning from Human Feedback (RLHF) is a …

Personalizing reinforcement learning from human feedback with variational preference learning

S Poddar, Y Wan, H Ivison, A Gupta… - arXiv preprint arXiv …, 2024 - arxiv.org
Reinforcement Learning from Human Feedback (RLHF) is a powerful paradigm for aligning
foundation models to human values and preferences. However, current RLHF techniques …

Are Large Language Models Consistent over Value-laden Questions?

J Moore, T Deshpande, D Yang - arXiv preprint arXiv:2407.02996, 2024 - arxiv.org
Large language models (LLMs) appear to bias their survey answers toward certain values.
Nonetheless, some argue that LLMs are too inconsistent to simulate particular values. Are …

Beyond accuracy: Investigating error types in GPT-4 responses to USMLE questions

S Roy, A Khatua, F Ghoochani, U Hadler… - Proceedings of the 47th …, 2024 - dl.acm.org
GPT-4 demonstrates high accuracy in medical QA tasks, leading with an accuracy of
86.70%, followed by Med-PaLM 2 at 86.50%. However, around 14% of errors remain …

How (not) to use sociodemographic information for subjective nlp tasks

T Beck, H Schuff, A Lauscher, I Gurevych - arXiv preprint arXiv:2309.07034, 2023 - arxiv.org
Annotators' sociodemographic backgrounds (ie, the individual compositions of their gender,
age, educational background, etc.) have a strong impact on their decisions when working on …

A Taxonomy of Rater Disagreements: Surveying Challenges & Opportunities from the Perspective of Annotating Online Toxicity

W Zhang, H Guo, ID Kivlichan, V Prabhakaran… - arXiv preprint arXiv …, 2023 - arxiv.org
Toxicity is an increasingly common and severe issue in online spaces. Consequently, a rich
line of machine learning research over the past decade has focused on computationally …

Quantifying the persona effect in llm simulations

T Hu, N Collier - arXiv preprint arXiv:2402.10811, 2024 - arxiv.org
Large language models (LLMs) have shown remarkable promise in simulating human
language use and behavior. In this study, we delve into the intersection of persona variables …

Disentangling Perceptions of Offensiveness: Cultural and Moral Correlates

A Davani, M Díaz, D Baker, V Prabhakaran - The 2024 ACM Conference …, 2024 - dl.acm.org
Recent years have seen substantial investments in AI-based tools designed to detect
offensive language at scale, aiming to moderate social media platforms, and ensure safety …

Improving context-aware preference modeling for language models

S Pitis, Z Xiao, NL Roux, A Sordoni - arXiv preprint arXiv:2407.14916, 2024 - arxiv.org
While finetuning language models from pairwise preferences has proven remarkably
effective, the underspecified nature of natural language presents critical challenges. Direct …