Configurable Safety Tuning of Language Models with Synthetic Preference Data

V Gallego - arXiv preprint arXiv:2404.00495, 2024 - arxiv.org
State-of-the-art language model fine-tuning techniques, such as Direct Preference
Optimization (DPO), restrict user control by hard-coding predefined behaviors into the …

Configurable Safety Tuning of Language Models with Synthetic Preference Data

V Gallego - arXiv e-prints, 2024 - ui.adsabs.harvard.edu
State-of-the-art language model fine-tuning techniques, such as Direct Preference
Optimization (DPO), restrict user control by hard-coding predefined behaviors into the …