A survey of deep learning for mathematical reasoning

P Lu, L Qiu, W Yu, S Welleck, KW Chang - arXiv preprint arXiv:2212.10535, 2022 - arxiv.org
Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in
various fields, including science, engineering, finance, and everyday life. The development …

Holistic evaluation of language models

P Liang, R Bommasani, T Lee, D Tsipras… - arXiv preprint arXiv …, 2022 - arxiv.org
Language models (LMs) are becoming the foundation for almost all major language
technologies, but their capabilities, limitations, and risks are not well understood. We present …

Solving quantitative reasoning problems with language models

A Lewkowycz, A Andreassen… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Language models have achieved remarkable performance on a wide range of
tasks that require natural language understanding. Nevertheless, state-of-the-art models …

Leandojo: Theorem proving with retrieval-augmented language models

K Yang, A Swope, A Gu, R Chalamala… - Advances in …, 2024 - proceedings.neurips.cc
Large language models (LLMs) have shown promise in proving formal theorems using proof
assistants such as Lean. However, existing methods are difficult to reproduce or build on …

[HTML][HTML] Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI

M Abbasian, E Khatibi, I Azimi, D Oniani… - NPJ Digital …, 2024 - nature.com
Abstract Generative Artificial Intelligence is set to revolutionize healthcare delivery by
transforming traditional patient care into a more personalized, efficient, and proactive …

Exploiting programmatic behavior of llms: Dual-use through standard security attacks

D Kang, X Li, I Stoica, C Guestrin… - 2024 IEEE Security …, 2024 - ieeexplore.ieee.org
Recent advances in instruction-following large language models (LLMs) have led to
dramatic improvements in a range of NLP tasks. Unfortunately, we find that the same …

Measuring mathematical problem solving with the math dataset

D Hendrycks, C Burns, S Kadavath, A Arora… - arXiv preprint arXiv …, 2021 - arxiv.org
Many intellectual endeavors require mathematical problem solving, but this skill remains
beyond the capabilities of computers. To measure this ability in machine learning models …

Language models show human-like content effects on reasoning

I Dasgupta, AK Lampinen, SCY Chan… - arXiv preprint arXiv …, 2022 - arxiv.org
Abstract reasoning is a key ability for an intelligent system. Large language models (LMs)
achieve above-chance performance on abstract reasoning tasks, but exhibit many …

Frozen pretrained transformers as universal computation engines

K Lu, A Grover, P Abbeel, I Mordatch - Proceedings of the AAAI …, 2022 - ojs.aaai.org
We investigate the capability of a transformer pretrained on natural language to generalize
to other modalities with minimal finetuning--in particular, without finetuning of the self …

Vision transformers provably learn spatial structure

S Jelassi, M Sander, Y Li - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Abstract Vision Transformers (ViTs) have recently achieved comparable or superior
performance to Convolutional neural networks (CNNs) in computer vision. This empirical …