- 学术资源搜索

The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arXiv preprint arXiv …, 2024 - arxiv.org

Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

被引用次数：737 相关文章所有 4 个版本

[PDF] arxiv.org

Mavis: Mathematical visual instruction tuning

R Zhang, X Wei, D Jiang, Y Zhang, Z Guo… - arXiv preprint arXiv …, 2024 - arxiv.org

Multi-modal Large Language Models (MLLMs) have recently emerged as a significant focus
in academia and industry. Despite their proficiency in general multi-modal scenarios, the …

被引用次数：11 相关文章所有 3 个版本

[PDF] arxiv.org

Step-dpo: Step-wise preference optimization for long-chain reasoning of llms

X Lai, Z Tian, Y Chen, S Yang, X Peng, J Jia - arXiv preprint arXiv …, 2024 - arxiv.org

Mathematical reasoning presents a significant challenge for Large Language Models
(LLMs) due to the extensive and precise chain of reasoning required for accuracy. Ensuring …

被引用次数：11 相关文章所有 3 个版本

[PDF] arxiv.org

Building math agents with multi-turn iterative preference learning

W Xiong, C Shi, J Shen, A Rosenberg, Z Qin… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent studies have shown that large language models'(LLMs) mathematical problem-
solving capabilities can be enhanced by integrating external tools, such as code …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Qwen2. 5-math technical report: Toward mathematical expert model via self-improvement

A Yang, B Zhang, B Hui, B Gao, B Yu, C Li… - arXiv preprint arXiv …, 2024 - arxiv.org

In this report, we present a series of math-specific large language models: Qwen2. 5-Math
and Qwen2. 5-Math-Instruct-1.5 B/7B/72B. The core innovation of the Qwen2. 5 series lies in …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Mmlu-pro: A more robust and challenging multi-task language understanding benchmark

Y Wang, X Ma, G Zhang, Y Ni, A Chandra… - arXiv preprint arXiv …, 2024 - arxiv.org

In the age of large-scale language models, benchmarks like the Massive Multitask
Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI …

被引用次数：59 相关文章所有 2 个版本

[PDF] arxiv.org

Towards Effective and Efficient Continual Pre-training of Large Language Models

J Chen, Z Chen, J Wang, K Zhou, Y Zhu, J Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org

Continual pre-training (CPT) has been an important approach for adapting language models
to specific domains or tasks. To make the CPT approach more traceable, this paper presents …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

J Ni, F Xue, X Yue, Y Deng, M Shah, K Jain… - arXiv preprint arXiv …, 2024 - arxiv.org

Evaluating large language models (LLMs) is challenging. Traditional ground-truth-based
benchmarks fail to capture the comprehensiveness and nuance of real-world queries, while …

被引用次数：10 相关文章所有 2 个版本

[PDF] arxiv.org

A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

Y Zhang, X Chen, B Jin, S Wang, S Ji, W Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

In many scientific fields, large language models (LLMs) have revolutionized the way with
which text and other modalities of data (eg, molecules and proteins) are dealt, achieving …

被引用次数：2 相关文章

[PDF] arxiv.org

Improving llm reasoning through scaling inference computation with collaborative verification

Z Liang, Y Liu, T Niu, X Zhang, Y Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org

Despite significant advancements in the general capability of large language models
(LLMs), they continue to struggle with consistent and accurate reasoning, especially in …

被引用次数：1 相关文章所有 2 个版本

高级搜索

QQ 群

The llama 3 herd of models

Mavis: Mathematical visual instruction tuning

Step-dpo: Step-wise preference optimization for long-chain reasoning of llms

Building math agents with multi-turn iterative preference learning

Qwen2. 5-math technical report: Toward mathematical expert model via self-improvement

Mmlu-pro: A more robust and challenging multi-task language understanding benchmark

Towards Effective and Efficient Continual Pre-training of Large Language Models

MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

Improving llm reasoning through scaling inference computation with collaborative verification

引用