Large multimodal agents: A survey

J Xie, Z Chen, R Zhang, X Wan, G Li - arXiv preprint arXiv:2402.15116, 2024 - arxiv.org
Large language models (LLMs) have achieved superior performance in powering text-
based AI agents, endowing them with decision-making and reasoning abilities akin to …

Towards general computer control: A multimodal agent for red dead redemption ii as a case study

W Tan, Z Ding, W Zhang, B Li, B Zhou, J Yue… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite the success in specific tasks and scenarios, existing foundation agents, empowered
by large models (LMs) and advanced tools, still cannot generalize to different scenarios …

AndroidWorld: A dynamic benchmarking environment for autonomous agents

C Rawles, S Clinckemaillie, Y Chang, J Waltz… - arXiv preprint arXiv …, 2024 - arxiv.org
Autonomous agents that execute human tasks by controlling computers can enhance
human productivity and application accessibility. Yet, progress in this field will be driven by …

Automating the Enterprise with Foundation Models

M Wornow, A Narayan, K Opsahl-Ong… - arXiv preprint arXiv …, 2024 - arxiv.org
Automating enterprise workflows could unlock $4 trillion/year in productivity gains. Despite
being of interest to the data management community for decades, the ultimate vision of end …

Adaptive In-conversation Team Building for Language Model Agents

L Song, J Liu, J Zhang, S Zhang, A Luo, S Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Leveraging multiple large language model (LLM) agents has shown to be a promising
approach for tackling complex tasks, while the effective design of multiple agents for a …

Tdag: A multi-agent framework based on dynamic task decomposition and agent generation

Y Wang, Z Wu, J Yao, J Su - arXiv preprint arXiv:2402.10178, 2024 - arxiv.org
The emergence of Large Language Models (LLMs) like ChatGPT has inspired the
development of LLM-based agents capable of addressing complex, real-world tasks …

Agentstudio: A toolkit for building general virtual agents

L Zheng, Z Huang, Z Xue, X Wang, B An… - arXiv preprint arXiv …, 2024 - arxiv.org
Creating autonomous virtual agents capable of using arbitrary software on any digital device
remains a major challenge for artificial intelligence. Two key obstacles hinder progress …

Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models

F Xu, Q Sun, K Cheng, J Liu, Y Qiao, Z Wu - arXiv preprint arXiv …, 2024 - arxiv.org
One of the primary driving forces contributing to the superior performance of Large
Language Models (LLMs) is the extensive availability of human-annotated natural language …

Efficacy of Language Model Self-Play in Non-Zero-Sum Games

A Liao, N Tomlin, D Klein - arXiv preprint arXiv:2406.18872, 2024 - arxiv.org
Game-playing agents like AlphaGo have achieved superhuman performance through self-
play, which is theoretically guaranteed to yield optimal policies in competitive games …

Do Multimodal Foundation Models Understand Enterprise Workflows? A Benchmark for Business Process Management Tasks

M Wornow, A Narayan, B Viggiano, IS Khare… - arXiv preprint arXiv …, 2024 - arxiv.org
Existing ML benchmarks lack the depth and diversity of annotations needed for evaluating
models on business process management (BPM) tasks. BPM is the practice of documenting …