Reasoning like program executors

X Pi, Q Liu, B Chen, M Ziyadi, Z Lin, Q Fu, Y Gao… - arXiv preprint arXiv …, 2022 - arxiv.org
Reasoning over natural language is a long-standing goal for the research community.
However, studies have shown that existing language models are inadequate in reasoning …

Docmath-eval: Evaluating math reasoning capabilities of llms in understanding financial documents

Y Zhao, Y Long, H Liu, R Kamoi, L Nan… - Proceedings of the …, 2024 - aclanthology.org
Recent LLMs have demonstrated remarkable performance in solving exam-like math word
problems. However, the degree to which these numerical reasoning skills are effective in …

PACIFIC: towards proactive conversational question answering over tabular and textual data in finance

Y Deng, W Lei, W Zhang, W Lam, TS Chua - arXiv preprint arXiv …, 2022 - arxiv.org
To facilitate conversational question answering (CQA) over hybrid contexts in finance, we
present a new dataset, named PACIFIC. Compared with existing CQA datasets, PACIFIC …

KnowledgeFMath: A Knowledge-Intensive Math Reasoning Dataset in Finance Domains

Y Zhao, H Liu, Y Long, R Zhang, C Zhao… - Proceedings of the …, 2024 - aclanthology.org
We introduce KnowledgeFMath, a novel benchmark designed to evaluate LLMs' capabilities
in solving knowledge-intensive math reasoning problems. Compared to prior works, this …

Answering numerical reasoning questions in table-text hybrid contents with graph-based encoder and tree-based decoder

F Lei, S He, X Li, J Zhao, K Liu - arXiv preprint arXiv:2209.07692, 2022 - arxiv.org
In the real-world question answering scenarios, hybrid form combining both tabular and
textual contents has attracted more and more attention, among which numerical reasoning …

Towards complex document understanding by discrete reasoning

F Zhu, W Lei, F Feng, C Wang, H Zhang… - Proceedings of the 30th …, 2022 - dl.acm.org
Document Visual Question Answering (VQA) aims to answer questions over visually-rich
documents. In this work, we introduce a new Document VQA dataset, named TAT-DQA …

Menatqa: A new dataset for testing the temporal comprehension and reasoning abilities of large language models

Y Wei, Y Su, H Ma, X Yu, F Lei, Y Zhang, J Zhao… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have shown nearly saturated performance on many natural
language processing (NLP) tasks. As a result, it is natural for people to believe that LLMs …

ReasTAP: Injecting table reasoning skills during pre-training via synthetic reasoning examples

Y Zhao, L Nan, Z Qi, R Zhang, D Radev - arXiv preprint arXiv:2210.12374, 2022 - arxiv.org
Reasoning over tabular data requires both table structure understanding and a broad set of
table reasoning skills. Current models with table-specific architectures and pre-training …

Docmath-eval: Evaluating numerical reasoning capabilities of llms in understanding long documents with tabular data

Y Zhao, Y Long, H Liu, L Nan, L Chen, R Kamoi… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent LLMs have demonstrated remarkable performance in solving exam-like math word
problems. However, the degree to which these numerical reasoning skills are effective in …

Tat-llm: A specialized language model for discrete reasoning over tabular and textual data

F Zhu, Z Liu, F Feng, C Wang, M Li, TS Chua - arXiv preprint arXiv …, 2024 - arxiv.org
In this work, we address question answering (QA) over a hybrid of tabular and textual data
that are very common content on the Web (eg SEC filings), where discrete reasoning …