Reframing human-AI collaboration for generating free-text explanations

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org

Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

被引用次数：157 相关文章所有 6 个版本

[PDF] ieee.org

Towards human-centered explainable ai: A survey of user studies for model explanations

Y Rong, T Leemann, TT Nguyen… - IEEE transactions on …, 2023 - ieeexplore.ieee.org

Explainable AI (XAI) is widely viewed as a sine qua non for ever-expanding AI research. A
better understanding of the needs of XAI users, as well as human-centered evaluations of …

被引用次数：85 相关文章所有 11 个版本

[PDF] neurips.cc

Toolqa: A dataset for llm question answering with external tools

Y Zhuang, Y Yu, K Wang, H Sun… - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract Large Language Models (LLMs) have demonstrated impressive performance in
various NLP tasks, but they still suffer from challenges such as hallucination and weak …

被引用次数：192 相关文章所有 6 个版本

[PDF] arxiv.org

Towards reasoning in large language models: A survey

J Huang, KCC Chang - arXiv preprint arXiv:2212.10403, 2022 - arxiv.org

Reasoning is a fundamental aspect of human intelligence that plays a crucial role in
activities such as problem solving, decision making, and critical thinking. In recent years …

被引用次数：655 相关文章所有 6 个版本

[PDF] neurips.cc

Large language model as attributed training data generator: A tale of diversity and bias

Y Yu, Y Zhuang, J Zhang, Y Meng… - Advances in …, 2024 - proceedings.neurips.cc

Large language models (LLMs) have been recently leveraged as training data generators
for various natural language processing (NLP) tasks. While previous research has explored …

被引用次数：174 相关文章所有 5 个版本

[PDF] arxiv.org

Challenging big-bench tasks and whether chain-of-thought can solve them

M Suzgun, N Scales, N Schärli, S Gehrmann… - arXiv preprint arXiv …, 2022 - arxiv.org

BIG-Bench (Srivastava et al., 2022) is a diverse evaluation suite that focuses on tasks
believed to be beyond the capabilities of current language models. Language models have …

被引用次数：623 相关文章所有 5 个版本

[PDF] neurips.cc

Chain-of-thought prompting elicits reasoning in large language models

J Wei, X Wang, D Schuurmans… - Advances in neural …, 2022 - proceedings.neurips.cc

We explore how generating a chain of thought---a series of intermediate reasoning steps---
significantly improves the ability of large language models to perform complex reasoning. In …

被引用次数：10354 相关文章所有 17 个版本

[PDF] arxiv.org

Reasoning with language model prompting: A survey

S Qiao, Y Ou, N Zhang, X Chen, Y Yao, S Deng… - arXiv preprint arXiv …, 2022 - arxiv.org

Reasoning, as an essential ability for complex problem-solving, can provide back-end
support for various real-world applications, such as medical diagnosis, negotiation, etc. This …

被引用次数：254 相关文章所有 6 个版本

[PDF] arxiv.org

Can language models learn from explanations in context?

AK Lampinen, I Dasgupta, SCY Chan… - arXiv preprint arXiv …, 2022 - arxiv.org

Language Models (LMs) can perform new tasks by adapting to a few in-context examples.
For humans, explanations that connect examples to task principles can improve learning …

被引用次数：281 相关文章所有 6 个版本

[PDF] neurips.cc

The unreliability of explanations in few-shot prompting for textual reasoning

X Ye, G Durrett - Advances in neural information processing …, 2022 - proceedings.neurips.cc

Does prompting a large language model (LLM) like GPT-3 with explanations improve in-
context learning? We study this question on two NLP tasks that involve reasoning over text …

被引用次数：163 相关文章所有 6 个版本

高级搜索

QQ 群