相关文章- 学术资源搜索

MAF: Multi-aspect feedback for improving reasoning in large language models

D Nathani, D Wang, L Pan, WY Wang - arXiv preprint arXiv:2310.12426, 2023 - arxiv.org

Language Models (LMs) have shown impressive performance in various natural language
tasks. However, when it comes to natural language reasoning, LMs still face challenges …

被引用次数：12 相关文章所有 6 个版本

[HTML] mit.edu

[HTML][HTML] Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies

L Pan, M Saxon, W Xu, D Nathani, X Wang… - Transactions of the …, 2024 - direct.mit.edu

While large language models (LLMs) have shown remarkable effectiveness in various NLP
tasks, they are still prone to issues such as hallucination, unfaithful reasoning, and toxicity. A …

被引用次数：7 相关文章所有 4 个版本

[PDF] arxiv.org

Glore: When, where, and how to improve llm reasoning via global and local refinements

A Havrilla, S Raparthy, C Nalmpantis… - arXiv preprint arXiv …, 2024 - arxiv.org

State-of-the-art language models can exhibit impressive reasoning refinement capabilities
on math, science or coding tasks. However, recent work demonstrates that even the best …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

CriticBench: Benchmarking LLMs for Critique-Correct Reasoning

Z Lin, Z Gou, T Liang, R Luo, H Liu, Y Yang - arXiv preprint arXiv …, 2024 - arxiv.org

The ability of Large Language Models (LLMs) to critique and refine their reasoning is crucial
for their application in evaluation, feedback provision, and self-improvement. This paper …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

A & b== b & a: Triggering logical reasoning failures in large language models

Y Wan, W Wang, Y Yang, Y Yuan, J Huang… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent advancements in large language models (LLMs) have propelled Artificial
Intelligence (AI) to new heights, enabling breakthroughs in various tasks such as writing …

被引用次数：6 相关文章所有 3 个版本

[PDF] aclanthology.org

Deceptive Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination?

B Li, B Zhou, F Wang, X Fu, D Roth… - Proceedings of the 2024 …, 2024 - aclanthology.org

Despite the high performances of large language models (LLMs) across numerous
benchmarks, recent research has unveiled their suffering from hallucinations and unfaithful …

被引用次数：2 相关文章

[PDF] openreview.net

Dyval: Dynamic evaluation of large language models for reasoning tasks

K Zhu, J Chen, J Wang, NZ Gong, D Yang… - The Twelfth International …, 2023 - openreview.net

Large language models (LLMs) have achieved remarkable performance in various
evaluation benchmarks. However, concerns are raised about potential data contamination in …

被引用次数：4 相关文章

[PDF] arxiv.org

At Which Training Stage Does Code Data Help LLMs Reasoning?

Y Ma, Y Liu, Y Yu, Y Zhang, Y Jiang, C Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have exhibited remarkable reasoning capabilities and
become the foundation of language technologies. Inspired by the great success of code data …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Towards reasoning in large language models: A survey

J Huang, KCC Chang - arXiv preprint arXiv:2212.10403, 2022 - arxiv.org

Reasoning is a fundamental aspect of human intelligence that plays a crucial role in
activities such as problem solving, decision making, and critical thinking. In recent years …

被引用次数：383 相关文章所有 6 个版本

[PDF] arxiv.org

Automatically correcting large language models: Surveying the landscape of diverse self-correction strategies

L Pan, M Saxon, W Xu, D Nathani, X Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) have demonstrated remarkable performance across a wide
array of NLP tasks. However, their efficacy is undermined by undesired and inconsistent …

被引用次数：100 相关文章所有 2 个版本

高级搜索

QQ 群

MAF: Multi-aspect feedback for improving reasoning in large language models

[HTML][HTML] Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies

Glore: When, where, and how to improve llm reasoning via global and local refinements

CriticBench: Benchmarking LLMs for Critique-Correct Reasoning

A & b== b & a: Triggering logical reasoning failures in large language models

Deceptive Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination?

Dyval: Dynamic evaluation of large language models for reasoning tasks

At Which Training Stage Does Code Data Help LLMs Reasoning?

Towards reasoning in large language models: A survey

Automatically correcting large language models: Surveying the landscape of diverse self-correction strategies

引用