NOPE: A corpus of naturally-occurring presuppositions in English

E Davis - ACM Computing Surveys, 2023 - dl.acm.org

More than one hundred benchmarks have been developed to test the commonsense
knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems …

被引用次数：39 相关文章所有 4 个版本

[PDF] neurips.cc

The goldilocks of pragmatic understanding: Fine-tuning strategy matters for implicature resolution by llms

L Ruis, A Khan, S Biderman, S Hooker… - Advances in …, 2024 - proceedings.neurips.cc

Despite widespread use of LLMs as conversational agents, evaluations of performance fail
to capture a crucial aspect of communication: interpreting language in context …

被引用次数：18 相关文章所有 3 个版本

[PDF] arxiv.org

Understanding transformer memorization recall through idioms

A Haviv, I Cohen, J Gidron, R Schuster… - arXiv preprint arXiv …, 2022 - arxiv.org

To produce accurate predictions, language models (LMs) must balance between
generalization and memorization. Yet, little is known about the mechanism by which …

被引用次数：28 相关文章所有 3 个版本

[PDF] arxiv.org

: Question Answering with Questionable Assumptions

N Kim, PM Htut, SR Bowman, J Petty - arXiv preprint arXiv:2212.10003, 2022 - arxiv.org

Naturally occurring information-seeking questions often contain questionable assumptions--
assumptions that are false or unverifiable. Questions containing questionable assumptions …

被引用次数：19 相关文章所有 7 个版本

[PDF] arxiv.org

Bhasa: A holistic southeast asian linguistic and cultural evaluation suite for large language models

WQ Leong, JG Ngui, Y Susanto, H Rengarajan… - arXiv preprint arXiv …, 2023 - arxiv.org

The rapid development of Large Language Models (LLMs) and the emergence of novel
abilities with scale have necessitated the construction of holistic, diverse and challenging …

被引用次数：10 相关文章所有 3 个版本

[PDF] arxiv.org

Is this the real life? is this just fantasy? the misleading success of simulating social interactions with llms

X Zhou, Z Su, T Eisape, H Kim, M Sap - arXiv preprint arXiv:2403.05020, 2024 - arxiv.org

Recent advances in large language models (LLM) have enabled richer social simulations,
allowing for the study of various social phenomena. However, most recent work has used a …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

Pragmatics in language grounding: Phenomena, tasks, and modeling approaches

D Fried, N Tomlin, J Hu, R Patel… - arXiv preprint arXiv …, 2022 - arxiv.org

People rely heavily on context to enrich meaning beyond what is literally said, enabling
concise but effective communication. To interact successfully and naturally with people, user …

被引用次数：11 相关文章所有 6 个版本

[PDF] arxiv.org

Evaluating paraphrastic robustness in textual entailment models

D Verma, YK Lal, S Sinha, B Van Durme… - arXiv preprint arXiv …, 2023 - arxiv.org

We present PaRTE, a collection of 1,126 pairs of Recognizing Textual Entailment (RTE)
examples to evaluate whether models are robust to paraphrasing. We posit that if RTE …

被引用次数：4 相关文章所有 8 个版本

[PDF] aclanthology.org

Pregnant Questions: The Importance of Pragmatic Awareness in Maternal Health Question Answering

N Srikanth, R Sarkar, H Mane, E Aparicio… - Proceedings of the …, 2024 - aclanthology.org

Questions posed by information-seeking users often contain implicit false or potentially
harmful assumptions. In a high-risk domain such as maternal and infant health, a question …

[PDF] arxiv.org

Towards Understanding What Code Language Models Learned

T Ahmed, D Yu, C Huang, C Wang, P Devanbu… - arXiv preprint arXiv …, 2023 - arxiv.org

Pre-trained language models are effective in a variety of natural language tasks, but it has
been argued their capabilities fall short of fully learning meaning or understanding …

被引用次数：2 相关文章所有 2 个版本

高级搜索

QQ 群