Internal consistency and self-feedback in large language models: A survey

X Liang, S Song, Z Zheng, H Wang, Q Yu, X Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) are expected to respond accurately but often exhibit
deficient reasoning or generate hallucinatory content. To address these, studies prefixed …

Chain of thoughtlessness: An analysis of cot in planning

K Stechly, K Valmeekam, S Kambhampati - arXiv preprint arXiv …, 2024 - arxiv.org
Large language model (LLM) performance on reasoning problems typically does not
generalize out of distribution. Previous work has claimed that this can be mitigated by …

RATIONALYST: Pre-training Process-Supervision for Improving Reasoning

D Jiang, G Wang, Y Lu, A Wang, J Zhang, C Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
The reasoning steps generated by LLMs might be incomplete, as they mimic logical leaps
common in everyday communication found in their pre-training data: underlying rationales …

From Text to Life: On the Reciprocal Relationship between Artificial Life and Large Language Models

E Nisioti, C Glanois, E Najarro, A Dai… - Artificial Life …, 2024 - direct.mit.edu
Abstract Large Language Models (LLMs) have taken the field of AI by storm, but their
adoption in the field of Artificial Life (ALife) has been, so far, relatively reserved. In this work …

Direct-Inverse Prompting: Analyzing LLMs' Discriminative Capacity in Self-Improving Generation

JJ Ahn, R Kamoi, L Cheng, R Zhang, W Yin - arXiv preprint arXiv …, 2024 - arxiv.org
Mainstream LLM research has primarily focused on enhancing their generative capabilities.
However, even the most advanced LLMs experience uncertainty in their outputs, often …

Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning Tasks

J He, H Lin, Q Wang, Y Fung, H Ji - arXiv preprint arXiv:2410.04055, 2024 - arxiv.org
While Vision-Language Models (VLMs) have shown remarkable abilities in visual and
language reasoning tasks, they invariably generate flawed responses. Self-correction that …

What's Wrong? Refining Meeting Summaries with LLM Feedback

F Kirstein, T Ruas, B Gipp - arXiv preprint arXiv:2407.11919, 2024 - arxiv.org
Meeting summarization has become a critical task since digital encounters have become a
common practice. Large language models (LLMs) show great potential in summarization …

WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment

J Ou, A Uzunoglu, B Van Durme… - arXiv preprint arXiv …, 2024 - arxiv.org
AI systems make decisions in physical environments through primitive actions or
affordances that are accessed via API calls. While deploying AI agents in the real world …

Just say what you want: only-prompting self-rewarding online preference optimization

R Xu, Z Liu, Y Liu, S Yan, Z Wang, Z Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
We address the challenge of online Reinforcement Learning from Human Feedback (RLHF)
with a focus on self-rewarding alignment methods. In online RLHF, obtaining feedback …

Divide-Verify-Refine: Aligning LLM Responses with Complex Instructions

X Zhang, X Tang, H Liu, Z Wu, Q He, D Lee… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent studies show that LLMs, particularly open-source models, struggle to follow complex
instructions with multiple constraints. Despite the importance, methods to improve LLMs' …