A survey of deep learning for mathematical reasoning

P Lu, L Qiu, W Yu, S Welleck, KW Chang - arXiv preprint arXiv:2212.10535, 2022 - arxiv.org
Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in
various fields, including science, engineering, finance, and everyday life. The development …

Mquake: Assessing knowledge editing in language models via multi-hop questions

Z Zhong, Z Wu, CD Manning, C Potts… - arXiv preprint arXiv …, 2023 - arxiv.org
The information stored in large language models (LLMs) falls out of date quickly, and
retraining from scratch is often not an option. This has recently given rise to a range of …

Knowledge conflicts for llms: A survey

R Xu, Z Qi, Z Guo, C Wang, H Wang, Y Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
This survey provides an in-depth analysis of knowledge conflicts for large language models
(LLMs), highlighting the complex challenges they encounter when blending contextual and …

Interactive natural language processing

Z Wang, G Zhang, K Yang, N Shi, W Zhou… - arXiv preprint arXiv …, 2023 - arxiv.org
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within
the field of NLP, aimed at addressing limitations in existing frameworks while aligning with …

Consistency analysis of chatgpt

ME Jang, T Lukasiewicz - arXiv preprint arXiv:2303.06273, 2023 - arxiv.org
ChatGPT has gained a huge popularity since its introduction. Its positive aspects have been
reported through many media platforms, and some analyses even showed that ChatGPT …

Conformal language modeling

V Quach, A Fisch, T Schuster, A Yala, JH Sohn… - arXiv preprint arXiv …, 2023 - arxiv.org
We propose a novel approach to conformal prediction for generative language models
(LMs). Standard conformal prediction produces prediction sets--in place of single predictions …

Internal consistency and self-feedback in large language models: A survey

X Liang, S Song, Z Zheng, H Wang, Q Yu, X Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) often exhibit deficient reasoning or generate hallucinations.
To address these, studies prefixed with" Self-" such as Self-Consistency, Self-Improve, and …

Cross-lingual consistency of factual knowledge in multilingual language models

J Qi, R Fernández, A Bisazza - arXiv preprint arXiv:2310.10378, 2023 - arxiv.org
Multilingual large-scale Pretrained Language Models (PLMs) have been shown to store
considerable amounts of factual knowledge, but large variations are observed across …

Human-like few-shot learning via bayesian reasoning over natural language

K Ellis - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc
A core tension in models of concept learning is that the model must carefully balance the
tractability of inference against the expressivity of the hypothesis class. Humans, however …

Benchmarking and improving generator-validator consistency of language models

XL Li, V Shrivastava, S Li, T Hashimoto… - arXiv preprint arXiv …, 2023 - arxiv.org
As of September 2023, ChatGPT correctly answers" what is 7+ 8" with 15, but when asked"
7+ 8= 15, True or False" it responds with" False". This inconsistency between generating …