Evaluating large language models: A comprehensive survey

Z Guo, R Jin, C Liu, Y Huang, D Shi, L Yu, Y Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have demonstrated remarkable capabilities across a broad
spectrum of tasks. They have attracted significant attention and been deployed in numerous …

[PDF][PDF] Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents

W Chen, Y Su, J Zuo, C Yang… - arXiv preprint …, 2023 - … .itic-sci.com
Autonomous agents empowered by Large Language Models (LLMs) have undergone
significant improvements, enabling them to generalize across a broad spectrum of tasks …

Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors

W Chen, Y Su, J Zuo, C Yang, C Yuan… - The Twelfth …, 2023 - openreview.net
Autonomous agents empowered by Large Language Models (LLMs) have undergone
significant improvements, enabling them to generalize across a broad spectrum of tasks …

Sociodojo: Building lifelong analytical agents with real-world text and time series

J Cheng, P Chin - The Twelfth International Conference on Learning …, 2024 - openreview.net
We introduce SocioDojo, an open-ended lifelong learning environment for developing ready-
to-deploy autonomous agents capable of performing human-like analysis and decision …

From interaction to impact: Towards safer ai agents through understanding and evaluating ui operation impacts

ZJ Zhang, E Schoop, J Nichols, A Mahajan… - arXiv preprint arXiv …, 2024 - arxiv.org
With advances in generative AI, there is increasing work towards creating autonomous
agents that can manage daily tasks by operating user interfaces (UIs). While prior research …

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

L Jing, Z Huang, X Wang, W Yao, W Yu, K Ma… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) have
demonstrated impressive language/vision reasoning abilities, igniting the recent trend of …

[PDF][PDF] Towards More Factual Large Language Models: Parametric and Non-parametric Approaches

Z Jiang - 2024 - kilthub.cmu.edu
Large language models (LLMs) are increasingly important in assisting people to access
information, ranging from simple factoid questions such as “where is the world's largest ice …