Agent-pro: Learning to evolve via policy-level reflection and optimization

W Zhang, K Tang, H Wu, M Wang, Y Shen… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models exhibit robust problem-solving capabilities for diverse tasks.
However, most LLM-based agents are designed as specific task solvers with sophisticated …

Clickagent: Enhancing ui location capabilities of autonomous agents

J Hoscilowicz, B Maj, B Kozakiewicz… - arXiv preprint arXiv …, 2024 - arxiv.org
With the growing reliance on digital devices equipped with graphical user interfaces (GUIs),
such as computers and smartphones, the need for effective automation tools has become …

RoboGolf: Mastering Real-World Minigolf with a Reflective Multi-Modality Vision-Language Model

H Zhou, T Ji, L Sommerhalder, M Goerner… - arXiv preprint arXiv …, 2024 - arxiv.org
Minigolf is an exemplary real-world game for examining embodied intelligence, requiring
challenging spatial and kinodynamic understanding to putt the ball. Additionally, reflective …

基于大模型的具身智能系统综述

王文晟, 谭宁, 黄凯, 张雨浓, 郑伟诗, 孙富春 - 自动化学报, 2025 - aas.net.cn
得益于近期具有世界知识的大规模预训练模型的迅速发展, 基于大模型的具身智能在各类任务中
取得了良好的效果, 展现出了强大的泛化能力与在各领域内广阔的应用前景. 鉴于此 …

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

Z Wang, S Cai, Z Mu, H Lin, C Zhang, X Liu, Q Li… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper presents OmniJARVIS, a novel Vision-Language-Action (VLA) model for open-
world instruction-following agents in Minecraft. Compared to prior works that either emit …

Autonomous Mental Development at the Individual and Collective Levels: Concept and Challenges

M Lippi, S Mariani, M Martinelli, F Zambonelli - IEEE Access, 2024 - ieeexplore.ieee.org
The increasing complexity and unpredictability of many ICT scenarios let us envision that
future systems will have to dynamically learn how to act and adapt to face evolving situations …

A taxonomy of architecture options for foundation model-based agents: Analysis and decision model

J Zhou, Q Lu, J Chen, L Zhu, X Xu, Z Xing… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid advancement of AI technology has led to widespread applications of agent
systems across various domains. However, the need for detailed architecture design poses …

AI to publish knowledge: a tectonic shift

T Lemberger - EMBO reports, 2024 - embopress.org
The rise of generative AI will transform scientific publishing but it also poses risks. While AI
enables the dissemination of knowledge in computable form, preserving transparency and …

Position: Foundation Agents as the Paradigm Shift for Decision Making

X Liu, X Lou, J Jiao, J Zhang - arXiv preprint arXiv:2405.17009, 2024 - arxiv.org
Decision making demands intricate interplay between perception, memory, and reasoning to
discern optimal policies. Conventional approaches to decision making face challenges …

Smart Mobility with Agent-based Foundation Models: Towards Interactive and Collaborative Intelligent Vehicles

B Xia, P Xie, J Wang - IEEE Transactions on Intelligent Vehicles, 2024 - ieeexplore.ieee.org
This letter reports the insights gained during a Distributed/Decentralized Hybrid Workshop
on Foundation/Infrastructure Intelligence (FII), where we discussed the evolving role of …