Natural language reasoning, a survey

F Yu, H Zhang, P Tiwari, B Wang - ACM Computing Surveys, 2023 - dl.acm.org
This survey paper proposes a clearer view of natural language reasoning in the field of
Natural Language Processing (NLP), both conceptually and practically. Conceptually, we …

Evaluating large language models: A comprehensive survey

Z Guo, R Jin, C Liu, Y Huang, D Shi, L Yu, Y Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have demonstrated remarkable capabilities across a broad
spectrum of tasks. They have attracted significant attention and been deployed in numerous …

Evaluating the logical reasoning ability of chatgpt and gpt-4

H Liu, R Ning, Z Teng, J Liu, Q Zhou… - arXiv preprint arXiv …, 2023 - arxiv.org
Harnessing logical reasoning ability is a comprehensive natural language understanding
endeavor. With the release of Generative Pretrained Transformer 4 (GPT-4), highlighted as" …

Dynabench: Rethinking benchmarking in NLP

D Kiela, M Bartolo, Y Nie, D Kaushik, A Geiger… - arXiv preprint arXiv …, 2021 - arxiv.org
We introduce Dynabench, an open-source platform for dynamic dataset creation and model
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …

Robustness gym: Unifying the NLP evaluation landscape

K Goel, N Rajani, J Vig, S Tan, J Wu, S Zheng… - arXiv preprint arXiv …, 2021 - arxiv.org
Despite impressive performance on standard benchmarks, deep neural networks are often
brittle when deployed in real-world systems. Consequently, recent research has focused on …

Logiqa 2.0—an improved dataset for logical reasoning in natural language understanding

H Liu, J Liu, L Cui, Z Teng, N Duan… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
NLP research on logical reasoning regains momentum with the recent releases of a handful
of datasets, notably LogiQA and Reclor. Logical reasoning is exploited in many probing …

Towards faithful model explanation in nlp: A survey

Q Lyu, M Apidianaki, C Callison-Burch - Computational Linguistics, 2024 - direct.mit.edu
End-to-end neural Natural Language Processing (NLP) models are notoriously difficult to
understand. This has given rise to numerous efforts towards model explainability in recent …

Recursion in recursion: Two-level nested recursion for length generalization with scalability

J Ray Chowdhury, C Caragea - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Binary Balanced Tree Recursive Neural Networks (BBT-RvNNs) enforce sequence
composition according to a preset balanced binary tree structure. Thus, their non-linear …

ANLIzing the adversarial natural language inference dataset

A Williams, T Thrush, D Kiela - arXiv preprint arXiv:2010.12729, 2020 - arxiv.org
We perform an in-depth error analysis of Adversarial NLI (ANLI), a recently introduced large-
scale human-and-model-in-the-loop natural language inference dataset collected over …

When llms meet cunning questions: A fallacy understanding benchmark for large language models

Y Li, Q Zhou, Y Luo, S Ma, Y Li, HT Zheng, X Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Recently, Large Language Models (LLMs) have made remarkable evolutions in language
understanding and generation. Following this, various benchmarks for measuring all kinds …