相关文章- 学术资源搜索

Exploring Safety-Utility Trade-Offs in Personalized Language Models

AR Vijjini, SBR Chowdhury, S Chaturvedi - arXiv preprint arXiv …, 2024 - arxiv.org

As large language models (LLMs) become increasingly integrated into daily applications, it
is essential to ensure they operate fairly across diverse user demographics. In this work, we …

被引用次数：3 相关文章

[PDF] arxiv.org

Design-Inclusive Language Models for Responsible Information Access

V Chatrath, O Bamgbose, S Raza - arXiv preprint arXiv:2310.18333, 2023 - arxiv.org

As the use of large language models (LLMs) increases for everyday tasks, appropriate
safeguards must be in place to ensure unbiased and safe output. Recent events highlight …

[PDF] arxiv.org

Understanding the Role of User Profile in the Personalization of Large Language Models

B Wu, Z Shi, HA Rahmani, V Ramineni… - arXiv preprint arXiv …, 2024 - arxiv.org

Utilizing user profiles to personalize Large Language Models (LLMs) has been shown to
enhance the performance on a wide range of tasks. However, the precise role of user …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models

YL Tuan, X Chen, EM Smith, L Martin, S Batra… - arXiv preprint arXiv …, 2024 - arxiv.org

As large language models (LLMs) become easily accessible nowadays, the trade-off
between safety and helpfulness can significantly impact user experience. A model that …

被引用次数：10 相关文章所有 2 个版本

[PDF] openreview.net

She had Cobalt Blue Eyes: Prompt Testing to Create Aligned and Sustainable Language Models

V Chatrath - 2nd Workshop on Sustainable AI, 2023 - openreview.net

As the use of large language models (LLMs) increases within society, as does the risk of
their misuse. Appropriate safeguards must be in place to ensure LLM outputs uphold the …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Configurable Safety Tuning of Language Models with Synthetic Preference Data

V Gallego - arXiv preprint arXiv:2404.00495, 2024 - arxiv.org

State-of-the-art language model fine-tuning techniques, such as Direct Preference
Optimization (DPO), restrict user control by hard-coding predefined behaviors into the …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Rule based rewards for language model safety

T Mu, A Helyar, J Heidecke, J Achiam… - arXiv preprint arXiv …, 2024 - arxiv.org

Reinforcement learning based fine-tuning of large language models (LLMs) on human
preferences has been shown to enhance both their capabilities and safety behavior …

被引用次数：7 相关文章所有 4 个版本

[PDF] arxiv.org

Dissecting Human and LLM Preferences

J Li, F Zhou, S Sun, Y Zhang, H Zhao, P Liu - arXiv preprint arXiv …, 2024 - arxiv.org

As a relative quality comparison of model responses, human and Large Language Model
(LLM) preferences serve as common alignment goals in model fine-tuning and criteria in …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases

D Bouchard, MS Chauhan, D Skarbrevik… - arXiv preprint arXiv …, 2025 - arxiv.org

Large Language Models (LLMs) have been observed to exhibit bias in numerous ways,
potentially creating or worsening outcomes for specific groups identified by protected …

Out-Of-Context Prompting Boosts Fairness and Robustness in Large Language Model Predictions

L Cotta, CJ Maddison - arXiv preprint arXiv:2406.07685, 2024 - arxiv.org

Frontier Large Language Models (LLMs) are increasingly being deployed for high-stakes
decision-making. On the other hand, these models are still consistently making predictions …

高级搜索

QQ 群

Exploring Safety-Utility Trade-Offs in Personalized Language Models

Design-Inclusive Language Models for Responsible Information Access

Understanding the Role of User Profile in the Personalization of Large Language Models

Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models

She had Cobalt Blue Eyes: Prompt Testing to Create Aligned and Sustainable Language Models

Configurable Safety Tuning of Language Models with Synthetic Preference Data

Rule based rewards for language model safety

Dissecting Human and LLM Preferences

LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases

Out-Of-Context Prompting Boosts Fairness and Robustness in Large Language Model Predictions

相关搜索

引用