The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arXiv preprint arXiv …, 2024 - arxiv.org
Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

Mavis: Mathematical visual instruction tuning

R Zhang, X Wei, D Jiang, Y Zhang, Z Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
Multi-modal Large Language Models (MLLMs) have recently emerged as a significant focus
in academia and industry. Despite their proficiency in general multi-modal scenarios, the …

Step-dpo: Step-wise preference optimization for long-chain reasoning of llms

X Lai, Z Tian, Y Chen, S Yang, X Peng, J Jia - arXiv preprint arXiv …, 2024 - arxiv.org
Mathematical reasoning presents a significant challenge for Large Language Models
(LLMs) due to the extensive and precise chain of reasoning required for accuracy. Ensuring …

Building math agents with multi-turn iterative preference learning

W Xiong, C Shi, J Shen, A Rosenberg, Z Qin… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent studies have shown that large language models'(LLMs) mathematical problem-
solving capabilities can be enhanced by integrating external tools, such as code …

Qwen2. 5-math technical report: Toward mathematical expert model via self-improvement

A Yang, B Zhang, B Hui, B Gao, B Yu, C Li… - arXiv preprint arXiv …, 2024 - arxiv.org
In this report, we present a series of math-specific large language models: Qwen2. 5-Math
and Qwen2. 5-Math-Instruct-1.5 B/7B/72B. The core innovation of the Qwen2. 5 series lies in …

Mmlu-pro: A more robust and challenging multi-task language understanding benchmark

Y Wang, X Ma, G Zhang, Y Ni, A Chandra… - arXiv preprint arXiv …, 2024 - arxiv.org
In the age of large-scale language models, benchmarks like the Massive Multitask
Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI …

Towards Effective and Efficient Continual Pre-training of Large Language Models

J Chen, Z Chen, J Wang, K Zhou, Y Zhu, J Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org
Continual pre-training (CPT) has been an important approach for adapting language models
to specific domains or tasks. To make the CPT approach more traceable, this paper presents …

MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

J Ni, F Xue, X Yue, Y Deng, M Shah, K Jain… - arXiv preprint arXiv …, 2024 - arxiv.org
Evaluating large language models (LLMs) is challenging. Traditional ground-truth-based
benchmarks fail to capture the comprehensiveness and nuance of real-world queries, while …

A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

Y Zhang, X Chen, B Jin, S Wang, S Ji, W Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
In many scientific fields, large language models (LLMs) have revolutionized the way with
which text and other modalities of data (eg, molecules and proteins) are dealt, achieving …

Improving llm reasoning through scaling inference computation with collaborative verification

Z Liang, Y Liu, T Niu, X Zhang, Y Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite significant advancements in the general capability of large language models
(LLMs), they continue to struggle with consistent and accurate reasoning, especially in …