Large language models cannot self-correct reasoning yet

J Huang, X Chen, S Mishra, HS Zheng, AW Yu… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have emerged as a groundbreaking technology with their
unparalleled text generation capabilities across various applications. Nevertheless …

Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies

L Pan, M Saxon, W Xu, D Nathani, X Wang… - Transactions of the …, 2024 - direct.mit.edu
While large language models (LLMs) have shown remarkable effectiveness in various NLP
tasks, they are still prone to issues such as hallucination, unfaithful reasoning, and toxicity. A …

Can ChatGPT defend its belief in truth? evaluating LLM reasoning via debate

B Wang, X Yue, H Sun - arXiv preprint arXiv:2305.13160, 2023 - arxiv.org
Large language models (LLMs) such as ChatGPT and GPT-4 have shown impressive
performance in complex reasoning tasks. However, it is difficult to know whether the models …

Autoact: Automatic agent learning from scratch via self-planning

S Qiao, N Zhang, R Fang, Y Luo, W Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
Language agents have achieved considerable performance on various complex tasks.
Despite the incessant exploration in this field, existing language agent systems still struggle …

Multi-agent consensus seeking via large language models

H Chen, W Ji, L Xu, S Zhao - arXiv preprint arXiv:2310.20151, 2023 - arxiv.org
Multi-agent systems driven by large language models (LLMs) have shown promising
abilities for solving complex tasks in a collaborative manner. This work considers a …

Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?

Q Wang, Z Wang, Y Su, H Tong, Y Song - arXiv preprint arXiv:2402.18272, 2024 - arxiv.org
Recent progress in LLMs discussion suggests that multi-agent discussion improves the
reasoning abilities of LLMs. In this work, we reevaluate this claim through systematic …

The consensus game: Language model generation via equilibrium search

AP Jacob, Y Shen, G Farina, J Andreas - arXiv preprint arXiv:2310.09139, 2023 - arxiv.org
When applied to question answering and other text generation tasks, language models
(LMs) may be queried generatively (by sampling answers from their output distribution) or …

Sentiment analysis through llm negotiations

X Sun, X Li, S Zhang, S Wang, F Wu, J Li… - arXiv preprint arXiv …, 2023 - arxiv.org
A standard paradigm for sentiment analysis is to rely on a singular LLM and makes the
decision in a single round under the framework of in-context learning. This framework suffers …

Semeval-2024 task 9: Brainteaser: A novel task defying common sense

Y Jiang, F Ilievski, K Ma - arXiv preprint arXiv:2404.16068, 2024 - arxiv.org
While vertical thinking relies on logical and commonsense reasoning, lateral thinking
requires systems to defy commonsense associations and overwrite them through …

Empowering biomedical discovery with ai agents

S Gao, A Fang, Y Huang, V Giunchiglia, A Noori… - arXiv preprint arXiv …, 2024 - arxiv.org
We envision'AI scientists' as systems capable of skeptical learning and reasoning that
empower biomedical research through collaborative agents that integrate machine learning …