Exploring Safety-Utility Trade-Offs in Personalized Language Models

AR Vijjini, SBR Chowdhury, S Chaturvedi - arXiv preprint arXiv …, 2024 - arxiv.org
As large language models (LLMs) become increasingly integrated into daily applications, it
is essential to ensure they operate fairly across diverse user demographics. In this work, we …

Design-Inclusive Language Models for Responsible Information Access

V Chatrath, O Bamgbose, S Raza - arXiv preprint arXiv:2310.18333, 2023 - arxiv.org
As the use of large language models (LLMs) increases for everyday tasks, appropriate
safeguards must be in place to ensure unbiased and safe output. Recent events highlight …

Understanding the Role of User Profile in the Personalization of Large Language Models

B Wu, Z Shi, HA Rahmani, V Ramineni… - arXiv preprint arXiv …, 2024 - arxiv.org
Utilizing user profiles to personalize Large Language Models (LLMs) has been shown to
enhance the performance on a wide range of tasks. However, the precise role of user …

Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models

YL Tuan, X Chen, EM Smith, L Martin, S Batra… - arXiv preprint arXiv …, 2024 - arxiv.org
As large language models (LLMs) become easily accessible nowadays, the trade-off
between safety and helpfulness can significantly impact user experience. A model that …

She had Cobalt Blue Eyes: Prompt Testing to Create Aligned and Sustainable Language Models

V Chatrath - 2nd Workshop on Sustainable AI, 2023 - openreview.net
As the use of large language models (LLMs) increases within society, as does the risk of
their misuse. Appropriate safeguards must be in place to ensure LLM outputs uphold the …

Configurable Safety Tuning of Language Models with Synthetic Preference Data

V Gallego - arXiv preprint arXiv:2404.00495, 2024 - arxiv.org
State-of-the-art language model fine-tuning techniques, such as Direct Preference
Optimization (DPO), restrict user control by hard-coding predefined behaviors into the …

Rule based rewards for language model safety

T Mu, A Helyar, J Heidecke, J Achiam… - arXiv preprint arXiv …, 2024 - arxiv.org
Reinforcement learning based fine-tuning of large language models (LLMs) on human
preferences has been shown to enhance both their capabilities and safety behavior …

Dissecting Human and LLM Preferences

J Li, F Zhou, S Sun, Y Zhang, H Zhao, P Liu - arXiv preprint arXiv …, 2024 - arxiv.org
As a relative quality comparison of model responses, human and Large Language Model
(LLM) preferences serve as common alignment goals in model fine-tuning and criteria in …

LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases

D Bouchard, MS Chauhan, D Skarbrevik… - arXiv preprint arXiv …, 2025 - arxiv.org
Large Language Models (LLMs) have been observed to exhibit bias in numerous ways,
potentially creating or worsening outcomes for specific groups identified by protected …

Out-Of-Context Prompting Boosts Fairness and Robustness in Large Language Model Predictions

L Cotta, CJ Maddison - arXiv preprint arXiv:2406.07685, 2024 - arxiv.org
Frontier Large Language Models (LLMs) are increasingly being deployed for high-stakes
decision-making. On the other hand, these models are still consistently making predictions …