Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension

A Rogers, M Gardner, I Augenstein - ACM Computing Surveys, 2023 - dl.acm.org
Alongside huge volumes of research on deep learning models in NLP in the recent years,
there has been much work on benchmark datasets needed to track modeling progress …

Benchmarks for automated commonsense reasoning: A survey

E Davis - ACM Computing Surveys, 2023 - dl.acm.org
More than one hundred benchmarks have been developed to test the commonsense
knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems …

Time-aware language models as temporal knowledge bases

B Dhingra, JR Cole, JM Eisenschlos… - Transactions of the …, 2022 - direct.mit.edu
Many facts come with an expiration date, from the name of the President to the basketball
team Lebron James plays for. However, most language models (LMs) are trained on …

Towards benchmarking and improving the temporal reasoning capability of large language models

Q Tan, HT Ng, L Bing - arXiv preprint arXiv:2306.08952, 2023 - arxiv.org
Reasoning about time is of fundamental importance. Many facts are time-dependent. For
example, athletes change teams from time to time, and different government officials are …

Language models can improve event prediction by few-shot abductive reasoning

X Shi, S Xue, K Wang, F Zhou… - Advances in …, 2024 - proceedings.neurips.cc
Large language models have shown astonishing performance on a wide range of reasoning
tasks. In this paper, we investigate whether they could reason about real-world events and …

Test of time: Instilling video-language models with a sense of time

P Bagad, M Tapaswi… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Modelling and understanding time remains a challenge in contemporary video
understanding models. With language emerging as a key driver towards powerful …

Streamingqa: A benchmark for adaptation to new knowledge over time in question answering models

A Liska, T Kocisky, E Gribovskaya… - International …, 2022 - proceedings.mlr.press
Abstract Knowledge and language understanding of models evaluated through question
answering (QA) has been usually studied on static snapshots of knowledge, like Wikipedia …

A dataset for answering time-sensitive questions

W Chen, X Wang, WY Wang - arXiv preprint arXiv:2108.06314, 2021 - arxiv.org
Time is an important dimension in our physical world. Lots of facts can evolve with respect to
time. For example, the US President might change every four years. Therefore, it is important …

[引用][C] Reasoning with transformer-based models: Deep learning, but shallow reasoning

C Helwe, C Clavel, F Suchanek - International Conference on …, 2021 - imt.hal.science
Recent years have seen impressive performance of transformer-based models on different
natural language processing tasks. However, it is not clear to what degree the transformers …

TIMEDIAL: Temporal commonsense reasoning in dialog

L Qin, A Gupta, S Upadhyay, L He, Y Choi… - arXiv preprint arXiv …, 2021 - arxiv.org
Everyday conversations require understanding everyday events, which in turn, requires
understanding temporal commonsense concepts interwoven with those events. Despite …