Z Guo, R Jin, C Liu, Y Huang, D Shi, L Yu, Y Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have demonstrated remarkable capabilities across a broad spectrum of tasks. They have attracted significant attention and been deployed in numerous …
Harnessing logical reasoning ability is a comprehensive natural language understanding endeavor. With the release of Generative Pretrained Transformer 4 (GPT-4), highlighted as" …
We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …
Despite impressive performance on standard benchmarks, deep neural networks are often brittle when deployed in real-world systems. Consequently, recent research has focused on …
NLP research on logical reasoning regains momentum with the recent releases of a handful of datasets, notably LogiQA and Reclor. Logical reasoning is exploited in many probing …
End-to-end neural Natural Language Processing (NLP) models are notoriously difficult to understand. This has given rise to numerous efforts towards model explainability in recent …
Abstract Binary Balanced Tree Recursive Neural Networks (BBT-RvNNs) enforce sequence composition according to a preset balanced binary tree structure. Thus, their non-linear …
We perform an in-depth error analysis of Adversarial NLI (ANLI), a recently introduced large- scale human-and-model-in-the-loop natural language inference dataset collected over …
Y Li, Q Zhou, Y Luo, S Ma, Y Li, HT Zheng, X Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Recently, Large Language Models (LLMs) have made remarkable evolutions in language understanding and generation. Following this, various benchmarks for measuring all kinds …