Beyond human data: Scaling self-training for problem-solving with language models

Z Chen, Y Deng, H Yuan, K Ji, Q Gu - arXiv preprint arXiv:2401.01335, 2024 - arxiv.org

Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is
pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the …

被引用次数：99 相关文章所有 3 个版本

[PDF] arxiv.org

A survey on knowledge distillation of large language models

X Xu, M Li, C Tao, T Shen, R Cheng, J Li, C Xu… - arXiv preprint arXiv …, 2024 - arxiv.org

This survey presents an in-depth exploration of knowledge distillation (KD) techniques
within the realm of Large Language Models (LLMs), spotlighting the pivotal role of KD in …

被引用次数：32 相关文章所有 2 个版本

[PDF] arxiv.org

Self-play preference optimization for language model alignment

Y Wu, Z Sun, H Yuan, K Ji, Y Yang, Q Gu - arXiv preprint arXiv:2405.00675, 2024 - arxiv.org

Traditional reinforcement learning from human feedback (RLHF) approaches relying on
parametric models like the Bradley-Terry model fall short in capturing the intransitivity and …

被引用次数：22 相关文章所有 2 个版本

[PDF] arxiv.org

Iterative reasoning preference optimization

RY Pang, W Yuan, K Cho, H He, S Sukhbaatar… - arXiv preprint arXiv …, 2024 - arxiv.org

Iterative preference optimization methods have recently been shown to perform well for
general instruction tuning tasks, but typically make little improvement on reasoning tasks …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

Understanding the performance gap between online and offline alignment algorithms

Y Tang, DZ Guo, Z Zheng, D Calandriello… - arXiv preprint arXiv …, 2024 - arxiv.org

Reinforcement learning from human feedback (RLHF) is the canonical framework for large
language model alignment. However, rising popularity in offline alignment algorithms …

被引用次数：14 相关文章所有 3 个版本

[PDF] arxiv.org

V-star: Training verifiers for self-taught reasoners

A Hosseini, X Yuan, N Malkin, A Courville… - arXiv preprint arXiv …, 2024 - arxiv.org

Common self-improvement approaches for large language models (LLMs), such as STaR
(Zelikman et al., 2022), iteratively fine-tune LLMs on self-generated solutions to improve …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

Masked thought: Simply masking partial reasoning steps can improve mathematical reasoning learning of language models

C Chen, X Wang, TE Lin, A Lv, Y Wu, X Gao… - arXiv preprint arXiv …, 2024 - arxiv.org

In reasoning tasks, even a minor error can cascade into inaccurate results, leading to
suboptimal performance of large language models in such domains. Earlier fine-tuning …

被引用次数：5 相关文章所有 2 个版本

[PDF] openreview.net

R2E: Turning any Github Repository into a Programming Agent Environment

N Jain, M Shetty, T Zhang, K Han, K Sen… - Forty-first International …, 2024 - openreview.net

While Large Language Models'(LLMs) coding capabilities have advanced rapidly,
corresponding evaluation benchmarks on real-world programming setups are yet to catch …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Llm2llm: Boosting llms with novel iterative data enhancement

N Lee, T Wattanawong, S Kim, K Mangalam… - arXiv preprint arXiv …, 2024 - arxiv.org

Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast
majority of natural language processing tasks. While many real-world applications still …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

Mathgenie: Generating synthetic data with question back-translation for enhancing mathematical reasoning of llms

Z Lu, A Zhou, H Ren, K Wang, W Shi, J Pan… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have exhibited great potential in mathematical reasoning.
However, there remains a performance gap in this area between existing open-source …

被引用次数：8 相关文章所有 2 个版本

高级搜索

QQ 群