Self-play fine-tuning converts weak language models to strong language models

Z Chen, Y Deng, H Yuan, K Ji, Q Gu - arXiv preprint arXiv:2401.01335, 2024 - arxiv.org
Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is
pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the …

A survey on knowledge distillation of large language models

X Xu, M Li, C Tao, T Shen, R Cheng, J Li, C Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
This survey presents an in-depth exploration of knowledge distillation (KD) techniques
within the realm of Large Language Models (LLMs), spotlighting the pivotal role of KD in …

Self-play preference optimization for language model alignment

Y Wu, Z Sun, H Yuan, K Ji, Y Yang, Q Gu - arXiv preprint arXiv:2405.00675, 2024 - arxiv.org
Traditional reinforcement learning from human feedback (RLHF) approaches relying on
parametric models like the Bradley-Terry model fall short in capturing the intransitivity and …

Iterative reasoning preference optimization

RY Pang, W Yuan, K Cho, H He, S Sukhbaatar… - arXiv preprint arXiv …, 2024 - arxiv.org
Iterative preference optimization methods have recently been shown to perform well for
general instruction tuning tasks, but typically make little improvement on reasoning tasks …

Understanding the performance gap between online and offline alignment algorithms

Y Tang, DZ Guo, Z Zheng, D Calandriello… - arXiv preprint arXiv …, 2024 - arxiv.org
Reinforcement learning from human feedback (RLHF) is the canonical framework for large
language model alignment. However, rising popularity in offline alignment algorithms …

V-star: Training verifiers for self-taught reasoners

A Hosseini, X Yuan, N Malkin, A Courville… - arXiv preprint arXiv …, 2024 - arxiv.org
Common self-improvement approaches for large language models (LLMs), such as STaR
(Zelikman et al., 2022), iteratively fine-tune LLMs on self-generated solutions to improve …

Masked thought: Simply masking partial reasoning steps can improve mathematical reasoning learning of language models

C Chen, X Wang, TE Lin, A Lv, Y Wu, X Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
In reasoning tasks, even a minor error can cascade into inaccurate results, leading to
suboptimal performance of large language models in such domains. Earlier fine-tuning …

R2E: Turning any Github Repository into a Programming Agent Environment

N Jain, M Shetty, T Zhang, K Han, K Sen… - Forty-first International …, 2024 - openreview.net
While Large Language Models'(LLMs) coding capabilities have advanced rapidly,
corresponding evaluation benchmarks on real-world programming setups are yet to catch …

Llm2llm: Boosting llms with novel iterative data enhancement

N Lee, T Wattanawong, S Kim, K Mangalam… - arXiv preprint arXiv …, 2024 - arxiv.org
Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast
majority of natural language processing tasks. While many real-world applications still …

Mathgenie: Generating synthetic data with question back-translation for enhancing mathematical reasoning of llms

Z Lu, A Zhou, H Ren, K Wang, W Shi, J Pan… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have exhibited great potential in mathematical reasoning.
However, there remains a performance gap in this area between existing open-source …