Receval: Evaluating reasoning chains via correctness and informativeness

G Agrawal, T Kumarage, Z Alghami, H Liu - arXiv preprint arXiv …, 2023 - arxiv.org

The contemporary LLMs are prone to producing hallucinations, stemming mainly from the
knowledge gaps within the models. To address this critical limitation, researchers employ …

被引用次数：31 相关文章所有 4 个版本

[PDF] acm.org

Explainability for large language models: A survey

H Zhao, H Chen, F Yang, N Liu, H Deng, H Cai… - ACM Transactions on …, 2024 - dl.acm.org

Large language models (LLMs) have demonstrated impressive capabilities in natural
language processing. However, their internal mechanisms are still unclear and this lack of …

被引用次数：177 相关文章所有 5 个版本

[PDF] arxiv.org

Tailoring self-rationalizers with multi-reward distillation

S Ramnath, B Joshi, S Hallinan, X Lu, LH Li… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LMs) are capable of generating free-text rationales to aid question
answering. However, prior work 1) suggests that useful self-rationalization is emergent only …

被引用次数：8 相关文章所有 4 个版本

[PDF] arxiv.org

How Interpretable are Reasoning Explanations from Prompting Large Language Models?

YW Jie, R Satapathy, GS Mong, E Cambria - arXiv preprint arXiv …, 2024 - arxiv.org

Prompt Engineering has garnered significant attention for enhancing the performance of
large language models across a multitude of tasks. Techniques such as the Chain-of …

被引用次数：4 相关文章所有 7 个版本

[PDF] arxiv.org

Leveraging training data in few-shot prompting for numerical reasoning

Z Jie, W Lu - arXiv preprint arXiv:2305.18170, 2023 - arxiv.org

Chain-of-thought (CoT) prompting with large language models has proven effective in
numerous natural language processing tasks, but designing prompts that generalize well to …

被引用次数：12 相关文章所有 4 个版本

[PDF] aclanthology.org

Socreval: Large language models with the socratic method for reference-free reasoning evaluation

H He, H Zhang, D Roth - Findings of the Association for …, 2024 - aclanthology.org

To comprehensively gauge the capacity of current models for complex reasoning, it is crucial
to assess their step-by-step reasoning in a scalable manner. Established reference-based …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

Pathfinder: Guided search over multi-step reasoning paths

O Golovneva, S O'Brien, R Pasunuru, T Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

With recent advancements in large language models, methods like chain-of-thought
prompting to elicit reasoning chains have been shown to improve results on reasoning …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Evaluating Mathematical Reasoning Beyond Accuracy

S Xia, X Li, Y Liu, T Wu, P Liu - arXiv preprint arXiv:2404.05692, 2024 - arxiv.org

The leaderboard of Large Language Models (LLMs) in mathematical tasks has been
continuously updated. However, the majority of evaluations focus solely on the final results …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

CLOMO: Counterfactual logical modification with large language models

Y Huang, R Hong, H Zhang, W Shao, Z Yang… - arXiv preprint arXiv …, 2023 - arxiv.org

In this study, we delve into the realm of counterfactual reasoning capabilities of large
language models (LLMs). Our primary objective is to cultivate the counterfactual thought …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models

S Hao, Y Gu, H Luo, T Liu, X Shao, X Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Generating accurate step-by-step reasoning is essential for Large Language Models (LLMs)
to address complex problems and enhance robustness and interpretability. Despite the flux …

被引用次数：1 相关文章所有 4 个版本

高级搜索

QQ 群