- 学术资源搜索

Fairness in large language models: A taxonomic survey

Z Chu, Z Wang, W Zhang - ACM SIGKDD explorations newsletter, 2024 - dl.acm.org

Large Language Models (LLMs) have demonstrated remarkable success across various
domains. However, despite their promising performance in numerous real-world …

被引用次数：27 相关文章所有 2 个版本

[PDF] arxiv.org

Personalized language modeling from personalized human feedback

X Li, R Zhou, ZC Lipton, L Leqi - arXiv preprint arXiv:2402.05133, 2024 - arxiv.org

Personalized large language models (LLMs) are designed to tailor responses to individual
user preferences. While Reinforcement Learning from Human Feedback (RLHF) is a …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Personalizing reinforcement learning from human feedback with variational preference learning

S Poddar, Y Wan, H Ivison, A Gupta… - arXiv preprint arXiv …, 2024 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) is a powerful paradigm for aligning
foundation models to human values and preferences. However, current RLHF techniques …

被引用次数：8 相关文章所有 3 个版本

[PDF] arxiv.org

Are Large Language Models Consistent over Value-laden Questions?

J Moore, T Deshpande, D Yang - arXiv preprint arXiv:2407.02996, 2024 - arxiv.org

Large language models (LLMs) appear to bias their survey answers toward certain values.
Nonetheless, some argue that LLMs are too inconsistent to simulate particular values. Are …

被引用次数：7 相关文章所有 3 个版本

[PDF] acm.org

Beyond accuracy: Investigating error types in GPT-4 responses to USMLE questions

S Roy, A Khatua, F Ghoochani, U Hadler… - Proceedings of the 47th …, 2024 - dl.acm.org

GPT-4 demonstrates high accuracy in medical QA tasks, leading with an accuracy of
86.70%, followed by Med-PaLM 2 at 86.50%. However, around 14% of errors remain …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

How (not) to use sociodemographic information for subjective nlp tasks

T Beck, H Schuff, A Lauscher, I Gurevych - arXiv preprint arXiv:2309.07034, 2023 - arxiv.org

Annotators' sociodemographic backgrounds (ie, the individual compositions of their gender,
age, educational background, etc.) have a strong impact on their decisions when working on …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

A Taxonomy of Rater Disagreements: Surveying Challenges & Opportunities from the Perspective of Annotating Online Toxicity

W Zhang, H Guo, ID Kivlichan, V Prabhakaran… - arXiv preprint arXiv …, 2023 - arxiv.org

Toxicity is an increasingly common and severe issue in online spaces. Consequently, a rich
line of machine learning research over the past decade has focused on computationally …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Quantifying the persona effect in llm simulations

T Hu, N Collier - arXiv preprint arXiv:2402.10811, 2024 - arxiv.org

Large language models (LLMs) have shown remarkable promise in simulating human
language use and behavior. In this study, we delve into the intersection of persona variables …

被引用次数：37 相关文章所有 2 个版本

[PDF] acm.org

Disentangling Perceptions of Offensiveness: Cultural and Moral Correlates

A Davani, M Díaz, D Baker, V Prabhakaran - The 2024 ACM Conference …, 2024 - dl.acm.org

Recent years have seen substantial investments in AI-based tools designed to detect
offensive language at scale, aiming to moderate social media platforms, and ensure safety …

被引用次数：10 相关文章所有 4 个版本

[PDF] arxiv.org

Improving context-aware preference modeling for language models

S Pitis, Z Xiao, NL Roux, A Sordoni - arXiv preprint arXiv:2407.14916, 2024 - arxiv.org

While finetuning language models from pairwise preferences has proven remarkably
effective, the underspecified nature of natural language presents critical challenges. Direct …

被引用次数：3 相关文章所有 2 个版本

高级搜索

QQ 群

Fairness in large language models: A taxonomic survey

Personalized language modeling from personalized human feedback

Personalizing reinforcement learning from human feedback with variational preference learning

Are Large Language Models Consistent over Value-laden Questions?

Beyond accuracy: Investigating error types in GPT-4 responses to USMLE questions

How (not) to use sociodemographic information for subjective nlp tasks

A Taxonomy of Rater Disagreements: Surveying Challenges & Opportunities from the Perspective of Annotating Online Toxicity

Quantifying the persona effect in llm simulations

Disentangling Perceptions of Offensiveness: Cultural and Moral Correlates

Improving context-aware preference modeling for language models

引用