Evaluating language models for mathematics through interactions

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

被引用次数：683 相关文章所有 3 个版本

[PDF] acm.org

A survey on evaluation of large language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM Transactions on …, 2024 - dl.acm.org

Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

被引用次数：2071 相关文章所有 4 个版本

[PDF] arxiv.org

Metamath: Bootstrap your own mathematical questions for large language models

L Yu, W Jiang, H Shi, J Yu, Z Liu, Y Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) have pushed the limits of natural language understanding
and exhibited excellent problem-solving ability. Despite the great success, most existing …

被引用次数：411 相关文章所有 5 个版本

[PDF] arxiv.org

Llemma: An open language model for mathematics

Z Azerbayev, H Schoelkopf, K Paster… - arXiv preprint arXiv …, 2023 - arxiv.org

We present Llemma, a large language model for mathematics. We continue pretraining
Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing …

被引用次数：253 相关文章所有 7 个版本

[PDF] arxiv.org

Can llms generate novel research ideas? a large-scale human study with 100+ nlp researchers

C Si, D Yang, T Hashimoto - arXiv preprint arXiv:2409.04109, 2024 - arxiv.org

Recent advancements in large language models (LLMs) have sparked optimism about their
potential to accelerate scientific discovery, with a growing number of works proposing …

被引用次数：51 相关文章所有 5 个版本

[PDF] arxiv.org

Building machines that learn and think with people

KM Collins, I Sucholutsky, U Bhatt, K Chandra… - Nature human …, 2024 - nature.com

What do we want from machine intelligence? We envision machines that are not just tools
for thought but partners in thought: reasonable, insightful, knowledgeable, reliable and …

被引用次数：15 相关文章所有 11 个版本

[PDF] arxiv.org

Openwebmath: An open dataset of high-quality mathematical web text

K Paster, MD Santos, Z Azerbayev, J Ba - arXiv preprint arXiv:2310.06786, 2023 - arxiv.org

There is growing evidence that pretraining on high quality, carefully thought-out tokens such
as code or mathematics plays an important role in improving the reasoning abilities of large …

被引用次数：54 相关文章所有 5 个版本

[PDF] arxiv.org

Towards responsible development of generative AI for education: An evaluation-driven approach

I Jurenka, M Kunesch, KR McKee, D Gillick… - arXiv preprint arXiv …, 2024 - arxiv.org

A major challenge facing the world is the provision of equitable and universal access to
quality education. Recent advances in generative AI (gen AI) have created excitement about …

被引用次数：26 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] Does ChatGPT enhance student learning? A systematic review and meta-analysis of experimental studies

R Deng, M Jiang, X Yu, Y Lu, S Liu - Computers & Education, 2024 - Elsevier

Abstract Chat Generative Pre-Trained Transformer (ChatGPT) has generated excitement
and concern in education. While cross-sectional studies have highlighted correlations …

被引用次数：2 相关文章所有 2 个版本

[PDF] osf.io

[PDF][PDF] From computation to adjudication: Evaluating large language model judges on mathematical reasoning and precision calculation

D Yanid, A Davenport, X Carmichael, N Thompson - 2024 - files.osf.io

Recent developments in language models have sparked interest in their potential
applications beyond natural language tasks, including domains that require precise …

被引用次数：33 相关文章所有 4 个版本

高级搜索

QQ 群