Smaug: Fixing failure modes of preference optimisation with dpo-positive

A Pal, D Karkhanis, S Dooley, M Roberts… - arXiv preprint arXiv …, 2024 - arxiv.org
Direct Preference Optimisation (DPO) is effective at significantly improving the performance
of large language models (LLMs) on downstream tasks such as reasoning, summarisation …

Relative preference optimization: Enhancing llm alignment through contrasting responses across identical and diverse prompts

Y Yin, Z Wang, Y Gu, H Huang, W Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
In the field of large language models (LLMs), aligning models with the diverse preferences
of users is a critical challenge. Direct Preference Optimization (DPO) has played a key role …

Reft: Reasoning with reinforced fine-tuning

TQ Luong, X Zhang, Z Jie, P Sun, X Jin, H Li - arXiv preprint arXiv …, 2024 - arxiv.org
One way to enhance the reasoning capability of Large Language Models (LLMs) is to
conduct Supervised Fine-Tuning (SFT) using Chain-of-Thought (CoT) annotations. This …

Towards analyzing and understanding the limitations of dpo: A theoretical perspective

D Feng, B Qin, C Huang, Z Zhang, W Lei - arXiv preprint arXiv:2404.04626, 2024 - arxiv.org
Direct Preference Optimization (DPO), which derives reward signals directly from pairwise
preference data, has shown its effectiveness on aligning Large Language Models (LLMs) …

RaFe: Ranking Feedback Improves Query Rewriting for RAG

S Mao, Y Jiang, B Chen, X Li, P Wang, X Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
As Large Language Models (LLMs) and Retrieval Augmentation Generation (RAG)
techniques have evolved, query rewriting has been widely incorporated into the RAG system …

Improving socratic question generation using data augmentation and preference optimization

NA Kumar, A Lan - arXiv preprint arXiv:2403.00199, 2024 - arxiv.org
The Socratic method is a way of guiding students toward solving a problem independently
without directly revealing the solution to the problem. Although this method has been shown …

PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models

D Jain, P Kumar, S Gehman, X Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advances in large language models (LLMs) have led to their extensive global
deployment, and ensuring their safety calls for comprehensive and multilingual toxicity …

Distributional Preference Alignment of LLMs via Optimal Transport

I Melnyk, Y Mroueh, B Belgodere, M Rigotti… - arXiv preprint arXiv …, 2024 - arxiv.org
Current LLM alignment techniques use pairwise human preferences at a sample level, and
as such, they do not imply an alignment on the distributional level. We propose in this paper …

CURATRON: Complete Robust Preference Data for Robust Alignment of Large Language Models

ST Nguyen, NU Naresh, T Tulabandhula - arXiv preprint arXiv:2403.02745, 2024 - arxiv.org
This paper addresses the challenges of aligning large language models (LLMs) with human
values via preference learning (PL), with a focus on the issues of incomplete and corrupted …

Position: Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback

V Conitzer, R Freedman, J Heitzig, WH Holliday… - Forty-first International … - openreview.net
Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic
behavior, such as helping to commit crimes or producing racist text. One approach to fine …