- 学术资源搜索

Large language models cannot self-correct reasoning yet

J Huang, X Chen, S Mishra, HS Zheng, AW Yu… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have emerged as a groundbreaking technology with their
unparalleled text generation capabilities across various applications. Nevertheless …

被引用次数：139 相关文章所有 3 个版本

[PDF] mit.edu

Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies

L Pan, M Saxon, W Xu, D Nathani, X Wang… - Transactions of the …, 2024 - direct.mit.edu

While large language models (LLMs) have shown remarkable effectiveness in various NLP
tasks, they are still prone to issues such as hallucination, unfaithful reasoning, and toxicity. A …

被引用次数：9 相关文章所有 4 个版本

[PDF] arxiv.org

Can ChatGPT defend its belief in truth? evaluating LLM reasoning via debate

B Wang, X Yue, H Sun - arXiv preprint arXiv:2305.13160, 2023 - arxiv.org

Large language models (LLMs) such as ChatGPT and GPT-4 have shown impressive
performance in complex reasoning tasks. However, it is difficult to know whether the models …

被引用次数：23 相关文章所有 6 个版本

[PDF] arxiv.org

Autoact: Automatic agent learning from scratch via self-planning

S Qiao, N Zhang, R Fang, Y Luo, W Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org

Language agents have achieved considerable performance on various complex tasks.
Despite the incessant exploration in this field, existing language agent systems still struggle …

被引用次数：15 相关文章所有 3 个版本

[PDF] arxiv.org

Multi-agent consensus seeking via large language models

H Chen, W Ji, L Xu, S Zhao - arXiv preprint arXiv:2310.20151, 2023 - arxiv.org

Multi-agent systems driven by large language models (LLMs) have shown promising
abilities for solving complex tasks in a collaborative manner. This work considers a …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?

Q Wang, Z Wang, Y Su, H Tong, Y Song - arXiv preprint arXiv:2402.18272, 2024 - arxiv.org

Recent progress in LLMs discussion suggests that multi-agent discussion improves the
reasoning abilities of LLMs. In this work, we reevaluate this claim through systematic …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

The consensus game: Language model generation via equilibrium search

AP Jacob, Y Shen, G Farina, J Andreas - arXiv preprint arXiv:2310.09139, 2023 - arxiv.org

When applied to question answering and other text generation tasks, language models
(LMs) may be queried generatively (by sampling answers from their output distribution) or …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

Sentiment analysis through llm negotiations

X Sun, X Li, S Zhang, S Wang, F Wu, J Li… - arXiv preprint arXiv …, 2023 - arxiv.org

A standard paradigm for sentiment analysis is to rely on a singular LLM and makes the
decision in a single round under the framework of in-context learning. This framework suffers …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Semeval-2024 task 9: Brainteaser: A novel task defying common sense

Y Jiang, F Ilievski, K Ma - arXiv preprint arXiv:2404.16068, 2024 - arxiv.org

While vertical thinking relies on logical and commonsense reasoning, lateral thinking
requires systems to defy commonsense associations and overwrite them through …

被引用次数：20 相关文章所有 2 个版本

[PDF] arxiv.org

Empowering biomedical discovery with ai agents

S Gao, A Fang, Y Huang, V Giunchiglia, A Noori… - arXiv preprint arXiv …, 2024 - arxiv.org

We envision'AI scientists' as systems capable of skeptical learning and reasoning that
empower biomedical research through collaborative agents that integrate machine learning …

被引用次数：5 相关文章所有 2 个版本

高级搜索

QQ 群

Large language models cannot self-correct reasoning yet

Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies

Can ChatGPT defend its belief in truth? evaluating LLM reasoning via debate

Autoact: Automatic agent learning from scratch via self-planning

Multi-agent consensus seeking via large language models

Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?

The consensus game: Language model generation via equilibrium search

Sentiment analysis through llm negotiations

Semeval-2024 task 9: Brainteaser: A novel task defying common sense

Empowering biomedical discovery with ai agents

引用