Benchmarks for automated commonsense reasoning: A survey

E Davis - ACM Computing Surveys, 2023 - dl.acm.org
More than one hundred benchmarks have been developed to test the commonsense
knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems …

The goldilocks of pragmatic understanding: Fine-tuning strategy matters for implicature resolution by llms

L Ruis, A Khan, S Biderman, S Hooker… - Advances in …, 2024 - proceedings.neurips.cc
Despite widespread use of LLMs as conversational agents, evaluations of performance fail
to capture a crucial aspect of communication: interpreting language in context …

Understanding transformer memorization recall through idioms

A Haviv, I Cohen, J Gidron, R Schuster… - arXiv preprint arXiv …, 2022 - arxiv.org
To produce accurate predictions, language models (LMs) must balance between
generalization and memorization. Yet, little is known about the mechanism by which …

: Question Answering with Questionable Assumptions

N Kim, PM Htut, SR Bowman, J Petty - arXiv preprint arXiv:2212.10003, 2022 - arxiv.org
Naturally occurring information-seeking questions often contain questionable assumptions--
assumptions that are false or unverifiable. Questions containing questionable assumptions …

Bhasa: A holistic southeast asian linguistic and cultural evaluation suite for large language models

WQ Leong, JG Ngui, Y Susanto, H Rengarajan… - arXiv preprint arXiv …, 2023 - arxiv.org
The rapid development of Large Language Models (LLMs) and the emergence of novel
abilities with scale have necessitated the construction of holistic, diverse and challenging …

Is this the real life? is this just fantasy? the misleading success of simulating social interactions with llms

X Zhou, Z Su, T Eisape, H Kim, M Sap - arXiv preprint arXiv:2403.05020, 2024 - arxiv.org
Recent advances in large language models (LLM) have enabled richer social simulations,
allowing for the study of various social phenomena. However, most recent work has used a …

Pragmatics in language grounding: Phenomena, tasks, and modeling approaches

D Fried, N Tomlin, J Hu, R Patel… - arXiv preprint arXiv …, 2022 - arxiv.org
People rely heavily on context to enrich meaning beyond what is literally said, enabling
concise but effective communication. To interact successfully and naturally with people, user …

Evaluating paraphrastic robustness in textual entailment models

D Verma, YK Lal, S Sinha, B Van Durme… - arXiv preprint arXiv …, 2023 - arxiv.org
We present PaRTE, a collection of 1,126 pairs of Recognizing Textual Entailment (RTE)
examples to evaluate whether models are robust to paraphrasing. We posit that if RTE …

Pregnant Questions: The Importance of Pragmatic Awareness in Maternal Health Question Answering

N Srikanth, R Sarkar, H Mane, E Aparicio… - Proceedings of the …, 2024 - aclanthology.org
Questions posed by information-seeking users often contain implicit false or potentially
harmful assumptions. In a high-risk domain such as maternal and infant health, a question …

Towards Understanding What Code Language Models Learned

T Ahmed, D Yu, C Huang, C Wang, P Devanbu… - arXiv preprint arXiv …, 2023 - arxiv.org
Pre-trained language models are effective in a variety of natural language tasks, but it has
been argued their capabilities fall short of fully learning meaning or understanding …