Branch-solve-merge improves large language model evaluation and generation

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

被引用次数：1892 相关文章所有 4 个版本

[PDF] amazonaws.com

Self-rewarding language models

W Yuan, RY Pang, K Cho, S Sukhbaatar, J Xu… - arXiv preprint arXiv …, 2024 - arxiv.org

We posit that to achieve superhuman agents, future models require superhuman feedback
in order to provide an adequate training signal. Current approaches commonly train reward …

被引用次数：126 相关文章所有 4 个版本

[PDF] arxiv.org

Llm-based nlg evaluation: Current status and challenges

M Gao, X Hu, J Ruan, X Pu, X Wan - arXiv preprint arXiv:2402.01383, 2024 - arxiv.org

Evaluating natural language generation (NLG) is a vital but challenging problem in artificial
intelligence. Traditional evaluation metrics mainly capturing content (eg n-gram) overlap …

被引用次数：18 相关文章所有 2 个版本

[PDF] arxiv.org

Evaluating large language models at evaluating instruction following

Z Zeng, J Yu, T Gao, Y Meng, T Goyal… - arXiv preprint arXiv …, 2023 - arxiv.org

As research in large language models (LLMs) continues to accelerate, LLM-based
evaluation has emerged as a scalable and cost-effective alternative to human evaluations …

被引用次数：50 相关文章所有 5 个版本

[PDF] arxiv.org

Self-discover: Large language models self-compose reasoning structures

P Zhou, J Pujara, X Ren, X Chen, HT Cheng… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-
intrinsic reasoning structures to tackle complex reasoning problems that are challenging for …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

A comprehensive survey on instruction following

R Lou, K Zhang, W Yin - arXiv preprint arXiv:2303.10475, 2023 - arxiv.org

Task semantics can be expressed by a set of input-output examples or a piece of textual
instruction. Conventional machine learning approaches for natural language processing …

被引用次数：3 相关文章所有 2 个版本

Beyond chatbots: Explorellm for structured thoughts and personalized model responses

X Ma, S Mishra, A Liu, SY Su, J Chen… - Extended Abstracts of …, 2024 - dl.acm.org

Large language model (LLM) powered chatbots are primarily text-based today, and impose
a large interactional cognitive load, especially for exploratory or sensemaking tasks such as …

被引用次数：8 相关文章所有 3 个版本

[PDF] arxiv.org

Topologies of reasoning: Demystifying chains, trees, and graphs of thoughts

M Besta, F Memedi, Z Zhang, R Gerstenberger… - arXiv preprint arXiv …, 2024 - arxiv.org

The field of natural language processing (NLP) has witnessed significant progress in recent
years, with a notable focus on improving large language models'(LLM) performance through …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Distilling System 2 into System 1

P Yu, J Xu, J Weston, I Kulikov - arXiv preprint arXiv:2407.06023, 2024 - arxiv.org

Large language models (LLMs) can spend extra compute during inference to generate
intermediate thoughts, which helps to produce better final responses. Since Chain-of …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Concept--An Evaluation Protocol on Conversation Recommender Systems with System-and User-centric Factors

C Huang, P Qin, Y Deng, W Lei, J Lv… - arXiv preprint arXiv …, 2024 - arxiv.org

The conversational recommendation system (CRS) has been criticized regarding its user
experience in real-world scenarios, despite recent significant progress achieved in …

被引用次数：3 相关文章所有 2 个版本

高级搜索

QQ 群