The recent surge in research interest in applying large language models (LLMs) to decision- making tasks has flourished by leveraging the extensive world knowledge embedded in …
Graphical User Interface (GUI) agents are designed to automate complex tasks on digital devices, such as smartphones and desktops. Most existing GUI agents interact with the …
Exploring rich environments and evaluating one's actions without prior knowledge is immensely challenging. In this paper, we propose Motif, a general method to interface such …
Abstract Graphical User Interface (GUI) automation holds significant promise for assisting users with complex tasks thereby boosting human productivity. Existing works leveraging …
Building agents with large language models (LLMs) for computer control is a burgeoning research area, where the agent receives computer states and performs actions to complete …
Large language models (LLMs) have been successfully adapted for interactive decision- making tasks like web navigation. While achieving decent performance, previous methods …
Large language models (LLMs) have shown increasing capacity at planning and executing a high-level goal in a live computer environment (eg MiniWoB++). To perform a task, recent …
Video creation has become increasingly popular, yet the expertise and effort required for editing often pose barriers to beginners. In this paper, we explore the integration of large …
Developers and quality assurance testers often rely on manual testing to test accessibility features throughout the product lifecycle. Unfortunately, manual testing can be tedious, often …