MAF: Multi-aspect feedback for improving reasoning in large language models

D Nathani, D Wang, L Pan, WY Wang - arXiv preprint arXiv:2310.12426, 2023 - arxiv.org
Language Models (LMs) have shown impressive performance in various natural language
tasks. However, when it comes to natural language reasoning, LMs still face challenges …

[HTML][HTML] Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies

L Pan, M Saxon, W Xu, D Nathani, X Wang… - Transactions of the …, 2024 - direct.mit.edu
While large language models (LLMs) have shown remarkable effectiveness in various NLP
tasks, they are still prone to issues such as hallucination, unfaithful reasoning, and toxicity. A …

Glore: When, where, and how to improve llm reasoning via global and local refinements

A Havrilla, S Raparthy, C Nalmpantis… - arXiv preprint arXiv …, 2024 - arxiv.org
State-of-the-art language models can exhibit impressive reasoning refinement capabilities
on math, science or coding tasks. However, recent work demonstrates that even the best …

CriticBench: Benchmarking LLMs for Critique-Correct Reasoning

Z Lin, Z Gou, T Liang, R Luo, H Liu, Y Yang - arXiv preprint arXiv …, 2024 - arxiv.org
The ability of Large Language Models (LLMs) to critique and refine their reasoning is crucial
for their application in evaluation, feedback provision, and self-improvement. This paper …

A & b== b & a: Triggering logical reasoning failures in large language models

Y Wan, W Wang, Y Yang, Y Yuan, J Huang… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in large language models (LLMs) have propelled Artificial
Intelligence (AI) to new heights, enabling breakthroughs in various tasks such as writing …

Deceptive Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination?

B Li, B Zhou, F Wang, X Fu, D Roth… - Proceedings of the 2024 …, 2024 - aclanthology.org
Despite the high performances of large language models (LLMs) across numerous
benchmarks, recent research has unveiled their suffering from hallucinations and unfaithful …

Dyval: Dynamic evaluation of large language models for reasoning tasks

K Zhu, J Chen, J Wang, NZ Gong, D Yang… - The Twelfth International …, 2023 - openreview.net
Large language models (LLMs) have achieved remarkable performance in various
evaluation benchmarks. However, concerns are raised about potential data contamination in …

At Which Training Stage Does Code Data Help LLMs Reasoning?

Y Ma, Y Liu, Y Yu, Y Zhang, Y Jiang, C Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have exhibited remarkable reasoning capabilities and
become the foundation of language technologies. Inspired by the great success of code data …

Towards reasoning in large language models: A survey

J Huang, KCC Chang - arXiv preprint arXiv:2212.10403, 2022 - arxiv.org
Reasoning is a fundamental aspect of human intelligence that plays a crucial role in
activities such as problem solving, decision making, and critical thinking. In recent years …

Automatically correcting large language models: Surveying the landscape of diverse self-correction strategies

L Pan, M Saxon, W Xu, D Nathani, X Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have demonstrated remarkable performance across a wide
array of NLP tasks. However, their efficacy is undermined by undesired and inconsistent …