Solving math word problems with process-and outcome-based feedback

J Li, T Tang, WX Zhao, JY Nie, JR Wen - ACM Computing Surveys, 2024 - dl.acm.org

Text Generation aims to produce plausible and readable text in human language from input
data. The resurgence of deep learning has greatly advanced this field, in particular, with the …

被引用次数：285 相关文章所有 7 个版本

[PDF] arxiv.org

Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review

B Chen, Z Zhang, N Langrené, S Zhu - arXiv preprint arXiv:2310.14735, 2023 - arxiv.org

This paper delves into the pivotal role of prompt engineering in unleashing the capabilities
of Large Language Models (LLMs). Prompt engineering is the process of structuring input …

被引用次数：104 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

被引用次数：2211 相关文章所有 4 个版本

[PDF] acm.org

Harnessing the power of llms in practice: A survey on chatgpt and beyond

J Yang, H Jin, R Tang, X Han, Q Feng, H Jiang… - ACM Transactions on …, 2024 - dl.acm.org

This article presents a comprehensive and practical guide for practitioners and end-users
working with Large Language Models (LLMs) in their downstream Natural Language …

被引用次数：424 相关文章所有 6 个版本

[PDF] neurips.cc

Language models don't always say what they think: unfaithful explanations in chain-of-thought prompting

M Turpin, J Michael, E Perez… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Large Language Models (LLMs) can achieve strong performance on many tasks by
producing step-by-step reasoning before giving a final output, often referred to as chain-of …

被引用次数：217 相关文章所有 6 个版本

[PDF] arxiv.org

Let's verify step by step

H Lightman, V Kosaraju, Y Burda, H Edwards… - arXiv preprint arXiv …, 2023 - arxiv.org

In recent years, large language models have greatly improved in their ability to perform
complex multi-step reasoning. However, even state-of-the-art models still regularly produce …

被引用次数：288 相关文章所有 3 个版本

[PDF] arxiv.org

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

被引用次数：277 相关文章所有 6 个版本

[PDF] arxiv.org

Trustworthy LLMs: A survey and guideline for evaluating large language models' alignment

Y Liu, Y Yao, JF Ton, X Zhang, R Guo, H Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org

Ensuring alignment, which refers to making models behave in accordance with human
intentions [1, 2], has become a critical task before deploying large language models (LLMs) …

被引用次数：166 相关文章所有 3 个版本

[PDF] arxiv.org

Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct

H Luo, Q Sun, C Xu, P Zhao, J Lou, C Tao… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs), such as GPT-4, have shown remarkable performance in
natural language processing (NLP) tasks, including challenging mathematical reasoning …

被引用次数：189 相关文章所有 2 个版本

[PDF] arxiv.org

Reinforced self-training (rest) for language modeling

C Gulcehre, TL Paine, S Srinivasan… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) can improve the quality of large
language model's (LLM) outputs by aligning them with human preferences. We propose a …

被引用次数：129 相关文章所有 4 个版本

高级搜索

QQ 群