Minif2f: a cross-system benchmark for formal olympiad-level mathematics

P Lu, L Qiu, W Yu, S Welleck, KW Chang - arXiv preprint arXiv:2212.10535, 2022 - arxiv.org

Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in
various fields, including science, engineering, finance, and everyday life. The development …

被引用次数：87 相关文章所有 6 个版本

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

被引用次数：2022 相关文章所有 4 个版本

[PDF] nature.com

Solving olympiad geometry without human demonstrations

TH Trinh, Y Wu, QV Le, H He, T Luong - Nature, 2024 - nature.com

Proving mathematical theorems at the olympiad level represents a notable milestone in
human-level automated reasoning,,–, owing to their reputed difficulty among the world's best …

被引用次数：145 相关文章所有 12 个版本

[PDF] neurips.cc

Leandojo: Theorem proving with retrieval-augmented language models

K Yang, A Swope, A Gu, R Chalamala… - Advances in …, 2024 - proceedings.neurips.cc

Large language models (LLMs) have shown promise in proving formal theorems using proof
assistants such as Lean. However, existing methods are difficult to reproduce or build on …

被引用次数：112 相关文章所有 9 个版本

[PDF] arxiv.org

Llemma: An open language model for mathematics

Z Azerbayev, H Schoelkopf, K Paster… - arXiv preprint arXiv …, 2023 - arxiv.org

We present Llemma, a large language model for mathematics. We continue pretraining
Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing …

被引用次数：126 相关文章所有 7 个版本

[PDF] neurips.cc

Autoformalization with large language models

Y Wu, AQ Jiang, W Li, M Rabe… - Advances in …, 2022 - proceedings.neurips.cc

Autoformalization is the process of automatically translating from natural language
mathematics to formal specifications and proofs. A successful autoformalization system …

被引用次数：110 相关文章所有 7 个版本

[PDF] arxiv.org

Datasets for large language models: A comprehensive survey

Y Liu, J Cao, C Liu, K Ding, L Jin - arXiv preprint arXiv:2402.18041, 2024 - arxiv.org

This paper embarks on an exploration into the Large Language Model (LLM) datasets,
which play a crucial role in the remarkable advancements of LLMs. The datasets serve as …

被引用次数：13 相关文章所有 4 个版本

[PDF] neurips.cc

Hypertree proof search for neural theorem proving

G Lample, T Lacroix, MA Lachaux… - Advances in neural …, 2022 - proceedings.neurips.cc

We propose an online training procedure for a transformer-based automated theorem
prover. Our approach leverages a new search algorithm, HyperTree Proof Search (HTPS) …

被引用次数：87 相关文章所有 9 个版本

[PDF] arxiv.org

Formal mathematics statement curriculum learning

S Polu, JM Han, K Zheng, M Baksys… - arXiv preprint arXiv …, 2022 - arxiv.org

We explore the use of expert iteration in the context of language modeling applied to formal
mathematics. We show that at same compute budget, expert iteration, by which we mean …

被引用次数：115 相关文章所有 6 个版本

[PDF] arxiv.org

Draft, sketch, and prove: Guiding formal theorem provers with informal proofs

AQ Jiang, S Welleck, JP Zhou, W Li, J Liu… - arXiv preprint arXiv …, 2022 - arxiv.org

The formalization of existing mathematical proofs is a notoriously difficult process. Despite
decades of research on automation and proof assistants, writing formal proofs remains …

被引用次数：91 相关文章所有 5 个版本

高级搜索

QQ 群