X Li, R Zhou, ZC Lipton, L Leqi - arXiv preprint arXiv:2402.05133, 2024 - arxiv.org
Personalized large language models (LLMs) are designed to tailor responses to individual user preferences. While Reinforcement Learning from Human Feedback (RLHF) is a …
Reinforcement Learning from Human Feedback (RLHF) is a powerful paradigm for aligning foundation models to human values and preferences. However, current RLHF techniques …
Large language models (LLMs) appear to bias their survey answers toward certain values. Nonetheless, some argue that LLMs are too inconsistent to simulate particular values. Are …
S Roy, A Khatua, F Ghoochani, U Hadler… - Proceedings of the 47th …, 2024 - dl.acm.org
GPT-4 demonstrates high accuracy in medical QA tasks, leading with an accuracy of 86.70%, followed by Med-PaLM 2 at 86.50%. However, around 14% of errors remain …
Annotators' sociodemographic backgrounds (ie, the individual compositions of their gender, age, educational background, etc.) have a strong impact on their decisions when working on …
Toxicity is an increasingly common and severe issue in online spaces. Consequently, a rich line of machine learning research over the past decade has focused on computationally …
T Hu, N Collier - arXiv preprint arXiv:2402.10811, 2024 - arxiv.org
Large language models (LLMs) have shown remarkable promise in simulating human language use and behavior. In this study, we delve into the intersection of persona variables …
Recent years have seen substantial investments in AI-based tools designed to detect offensive language at scale, aiming to moderate social media platforms, and ensure safety …
While finetuning language models from pairwise preferences has proven remarkably effective, the underspecified nature of natural language presents critical challenges. Direct …