From pixels to ui actions: Learning to follow instructions via graphical user interfaces

S Zhou, FF Xu, H Zhu, X Zhou, R Lo, A Sridhar… - arXiv preprint arXiv …, 2023 - arxiv.org

With advances in generative AI, there is now potential for autonomous agents to manage
daily tasks via natural language commands. However, current agents are primarily created …

被引用次数：158 相关文章所有 4 个版本

[PDF] aaai.org

Expel: Llm agents are experiential learners

A Zhao, D Huang, Q Xu, M Lin, YJ Liu… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

The recent surge in research interest in applying large language models (LLMs) to decision-
making tasks has flourished by leveraging the extensive world knowledge embedded in …

被引用次数：98 相关文章所有 3 个版本

[PDF] arxiv.org

Seeclick: Harnessing gui grounding for advanced visual gui agents

K Cheng, Q Sun, Y Chu, F Xu, Y Li, J Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Graphical User Interface (GUI) agents are designed to automate complex tasks on digital
devices, such as smartphones and desktops. Most existing GUI agents interact with the …

被引用次数：30 相关文章所有 3 个版本

[PDF] arxiv.org

Motif: Intrinsic motivation from artificial intelligence feedback

M Klissarov, P D'Oro, S Sodhani, R Raileanu… - arXiv preprint arXiv …, 2023 - arxiv.org

Exploring rich environments and evaluating one's actions without prior knowledge is
immensely challenging. In this paper, we propose Motif, a general method to interface such …

被引用次数：30 相关文章所有 6 个版本

[PDF] thecvf.com

AssistGUI: Task-Oriented PC Graphical User Interface Automation

D Gao, L Ji, Z Bai, M Ouyang, P Li… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Graphical User Interface (GUI) automation holds significant promise for assisting
users with complex tasks thereby boosting human productivity. Existing works leveraging …

被引用次数：1 相关文章

[PDF] openreview.net

Synapse: Trajectory-as-exemplar prompting with memory for computer control

L Zheng, R Wang, X Wang, B An - The Twelfth International …, 2023 - openreview.net

Building agents with large language models (LLMs) for computer control is a burgeoning
research area, where the agent receives computer states and performs actions to complete …

被引用次数：21 相关文章所有 5 个版本

[PDF] arxiv.org

Laser: Llm agent with state-space exploration for web navigation

K Ma, H Zhang, H Wang, X Pan, W Yu, D Yu - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) have been successfully adapted for interactive decision-
making tasks like web navigation. While achieving decent performance, previous methods …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

A Zero-Shot Language Agent for Computer Control with Structured Reflection

T Li, G Li, Z Deng, B Wang, Y Li - arXiv preprint arXiv:2310.08740, 2023 - arxiv.org

Large language models (LLMs) have shown increasing capacity at planning and executing
a high-level goal in a live computer environment (eg MiniWoB++). To perform a task, recent …

被引用次数：8 相关文章所有 5 个版本

LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing

B Wang, Y Li, Z Lv, H Xia, Y Xu, R Sodhi - Proceedings of the 29th …, 2024 - dl.acm.org

Video creation has become increasingly popular, yet the expertise and effort required for
editing often pose barriers to beginners. In this paper, we explore the integration of large …

被引用次数：11 相关文章所有 3 个版本

[PDF] acm.org

Axnav: Replaying accessibility tests from natural language

M Taeb, A Swearngin, E Schoop, R Cheng… - Proceedings of the CHI …, 2024 - dl.acm.org

Developers and quality assurance testers often rely on manual testing to test accessibility
features throughout the product lifecycle. Unfortunately, manual testing can be tedious, often …

被引用次数：11 相关文章所有 5 个版本

高级搜索

QQ 群