Webarena: A realistic web environment for building autonomous agents

S Zhou, FF Xu, H Zhu, X Zhou, R Lo, A Sridhar… - arXiv preprint arXiv …, 2023 - arxiv.org
With advances in generative AI, there is now potential for autonomous agents to manage
daily tasks via natural language commands. However, current agents are primarily created …

Expel: Llm agents are experiential learners

A Zhao, D Huang, Q Xu, M Lin, YJ Liu… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
The recent surge in research interest in applying large language models (LLMs) to decision-
making tasks has flourished by leveraging the extensive world knowledge embedded in …

Seeclick: Harnessing gui grounding for advanced visual gui agents

K Cheng, Q Sun, Y Chu, F Xu, Y Li, J Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Graphical User Interface (GUI) agents are designed to automate complex tasks on digital
devices, such as smartphones and desktops. Most existing GUI agents interact with the …

Motif: Intrinsic motivation from artificial intelligence feedback

M Klissarov, P D'Oro, S Sodhani, R Raileanu… - arXiv preprint arXiv …, 2023 - arxiv.org
Exploring rich environments and evaluating one's actions without prior knowledge is
immensely challenging. In this paper, we propose Motif, a general method to interface such …

AssistGUI: Task-Oriented PC Graphical User Interface Automation

D Gao, L Ji, Z Bai, M Ouyang, P Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Graphical User Interface (GUI) automation holds significant promise for assisting
users with complex tasks thereby boosting human productivity. Existing works leveraging …

Synapse: Trajectory-as-exemplar prompting with memory for computer control

L Zheng, R Wang, X Wang, B An - The Twelfth International …, 2023 - openreview.net
Building agents with large language models (LLMs) for computer control is a burgeoning
research area, where the agent receives computer states and performs actions to complete …

Laser: Llm agent with state-space exploration for web navigation

K Ma, H Zhang, H Wang, X Pan, W Yu, D Yu - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have been successfully adapted for interactive decision-
making tasks like web navigation. While achieving decent performance, previous methods …

A Zero-Shot Language Agent for Computer Control with Structured Reflection

T Li, G Li, Z Deng, B Wang, Y Li - arXiv preprint arXiv:2310.08740, 2023 - arxiv.org
Large language models (LLMs) have shown increasing capacity at planning and executing
a high-level goal in a live computer environment (eg MiniWoB++). To perform a task, recent …

LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing

B Wang, Y Li, Z Lv, H Xia, Y Xu, R Sodhi - Proceedings of the 29th …, 2024 - dl.acm.org
Video creation has become increasingly popular, yet the expertise and effort required for
editing often pose barriers to beginners. In this paper, we explore the integration of large …

Axnav: Replaying accessibility tests from natural language

M Taeb, A Swearngin, E Schoop, R Cheng… - Proceedings of the CHI …, 2024 - dl.acm.org
Developers and quality assurance testers often rely on manual testing to test accessibility
features throughout the product lifecycle. Unfortunately, manual testing can be tedious, often …