Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models- 学术资源搜索

文章

学术资源搜索

Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models

YL Tuan, X Chen, EM Smith, L Martin, S Batra… - arXiv preprint arXiv …, 2024 - arxiv.org

YL Tuan, X Chen, EM Smith, L Martin, S Batra, A Celikyilmaz, WY Wang, DM Bikel

arXiv preprint arXiv:2404.01295, 2024•arxiv.org

As large language models (LLMs) become easily accessible nowadays, the trade-off between safety and helpfulness can significantly impact user experience. A model that prioritizes safety will cause users to feel less engaged and assisted while prioritizing helpfulness will potentially cause harm. Possible harms include teaching people how to build a bomb, exposing youth to inappropriate content, and hurting users' mental health. In this work, we propose to balance safety and helpfulness in diverse use cases by controlling both attributes in LLM. We explore training-free and fine-tuning methods that do not require extra human annotations and analyze the challenges of controlling safety and helpfulness in LLMs. Our experiments demonstrate that our method can rewind a learned model and unlock its controllability.

arxiv.org

展开收起

被引用次数：10 相关文章所有 2 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models

引用