A survey on evaluation of large language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM Transactions on …, 2024 - dl.acm.org
Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

Autogen: Enabling next-gen llm applications via multi-agent conversation framework

Q Wu, G Bansal, J Zhang, Y Wu, S Zhang, E Zhu… - arXiv preprint arXiv …, 2023 - arxiv.org
This technical report presents AutoGen, a new framework that enables development of LLM
applications using multiple agents that can converse with each other to solve tasks. AutoGen …

Large language models for education: A survey and outlook

S Wang, T Xu, H Li, C Zhang, J Liang, J Tang… - arXiv preprint arXiv …, 2024 - arxiv.org
The advent of Large Language Models (LLMs) has brought in a new era of possibilities in
the realm of education. This survey paper summarizes the various technologies of LLMs in …

Large language models for mathematical reasoning: Progresses and challenges

J Ahn, R Verma, R Lou, D Liu, R Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Mathematical reasoning serves as a cornerstone for assessing the fundamental cognitive
capabilities of human intelligence. In recent times, there has been a notable surge in the …

Chatcot: Tool-augmented chain-of-thought reasoning on chat-based large language models

Z Chen, K Zhou, B Zhang, Z Gong, WX Zhao… - arXiv preprint arXiv …, 2023 - arxiv.org
Although large language models (LLMs) have achieved excellent performance in a variety
of evaluation benchmarks, they still struggle in complex reasoning tasks which require …

Advancing the search frontier with AI agents

RW White - Communications of the ACM, 2024 - dl.acm.org
Advancing the Search Frontier with AI Agents | Communications of the ACM skip to main
content ACM Digital Library home ACM Association for Computing Machinery corporate …

Brain in a vat: On missing pieces towards artificial general intelligence in large language models

Y Ma, C Zhang, SC Zhu - arXiv preprint arXiv:2307.03762, 2023 - arxiv.org
In this perspective paper, we first comprehensively review existing evaluations of Large
Language Models (LLMs) using both standardized tests and ability-oriented benchmarks …

The impact of large language models on scientific discovery: a preliminary study using gpt-4

MR AI4Science, MA Quantum - arXiv preprint arXiv:2311.07361, 2023 - arxiv.org
In recent years, groundbreaking advancements in natural language processing have
culminated in the emergence of powerful large language models (LLMs), which have …

Training language model agents without modifying language models

S Zhang, J Zhang, J Liu, L Song, C Wang… - arXiv e …, 2024 - ui.adsabs.harvard.edu
Researchers and practitioners have recently reframed powerful Large Language Models
(LLMs) as agents, enabling them to automate complex tasks largely via the use of …

Improving large language model fine-tuning for solving math problems

Y Liu, A Singh, CD Freeman, JD Co-Reyes… - arXiv preprint arXiv …, 2023 - arxiv.org
Despite their success in many natural language tasks, solving math problems remains a
significant challenge for large language models (LLMs). A large gap exists between LLMs' …