The unreasonable effectiveness of easy training data for hard tasks

C Burns, P Izmailov, JH Kirchner, B Baker… - arXiv preprint arXiv …, 2023 - arxiv.org

Widely used alignment techniques, such as reinforcement learning from human feedback
(RLHF), rely on the ability of humans to supervise model behavior-for example, to evaluate …

被引用次数：201 相关文章所有 7 个版本

[PDF] arxiv.org

Weak-to-strong reasoning

Y Yang, Y Ma, P Liu - arXiv preprint arXiv:2407.13647, 2024 - arxiv.org

When large language models (LLMs) exceed human-level capabilities, it becomes
increasingly challenging to provide full-scale and accurate supervision for these models …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Your Weak LLM is Secretly a Strong Teacher for Alignment

L Tao, Y Li - arXiv preprint arXiv:2409.08813, 2024 - arxiv.org

The burgeoning capabilities of large language models (LLMs) have underscored the need
for alignment to ensure these models act in accordance with human values and intentions …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs

A Opedal, H Shirakami, B Schölkopf, A Saparov… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) can solve arithmetic word problems with high accuracy, but
little is known about how well they generalize to problems that are more complex than the …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

EnsemW2S: Can an Ensemble of LLMs be Leveraged to Obtain a Stronger LLM?

A Agrawal, M Ding, Z Che, C Deng, A Satheesh… - arXiv preprint arXiv …, 2024 - arxiv.org

How can we harness the collective capabilities of multiple Large Language Models (LLMs)
to create an even more powerful model? This question forms the foundation of our research …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models

A Muhamed, M Diab, V Smith - arXiv preprint arXiv:2411.00743, 2024 - arxiv.org

Understanding and mitigating the potential risks associated with foundation models (FMs)
hinges on developing effective interpretability methods. Sparse Autoencoders (SAEs) have …

Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones?

Z Yang, Y Zhang, T Liu, J Yang, J Lin, C Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have demonstrated impressive capabilities, but still suffer
from inconsistency issues (eg LLMs can react differently to disturbances like rephrasing or …

被引用次数：5 相关文章

[PDF] arxiv.org

高级搜索

QQ 群