Language models can solve computer tasks

G Kim, P Baldi, S McAleer - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Agents capable of carrying out general tasks on a computer can improve efficiency and
productivity by automating repetitive tasks and assisting in complex problem-solving. Ideally …

Augmented language models: a survey

G Mialon, R Dessì, M Lomeli, C Nalmpantis… - arXiv preprint arXiv …, 2023 - arxiv.org
This survey reviews works in which language models (LMs) are augmented with reasoning
skills and the ability to use tools. The former is defined as decomposing a potentially …

Do as i can, not as i say: Grounding language in robotic affordances

M Ahn, A Brohan, N Brown, Y Chebotar… - arXiv preprint arXiv …, 2022 - arxiv.org
Large language models can encode a wealth of semantic knowledge about the world. Such
knowledge could be extremely useful to robots aiming to act upon high-level, temporally …

Video pretraining (vpt): Learning to act by watching unlabeled online videos

B Baker, I Akkaya, P Zhokov… - Advances in …, 2022 - proceedings.neurips.cc
Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for
training models with broad, general capabilities for text, images, and other modalities …

Androidinthewild: A large-scale dataset for android device control

C Rawles, A Li, D Rodriguez… - Advances in Neural …, 2024 - proceedings.neurips.cc
There is a growing interest in device-control systems that can interpret human natural
language instructions and execute them on a digital device by directly controlling its user …

Webshop: Towards scalable real-world web interaction with grounded language agents

S Yao, H Chen, J Yang… - Advances in Neural …, 2022 - proceedings.neurips.cc
Most existing benchmarks for grounding language in interactive environments either lack
realistic linguistic elements, or prove difficult to scale up due to substantial human …

A real-world webagent with planning, long context understanding, and program synthesis

I Gur, H Furuta, A Huang, M Safdari, Y Matsuo… - arXiv preprint arXiv …, 2023 - arxiv.org
Pre-trained large language models (LLMs) have recently achieved better generalization and
sample efficiency in autonomous web navigation. However, the performance on real-world …

From pixels to ui actions: Learning to follow instructions via graphical user interfaces

P Shaw, M Joshi, J Cohan, J Berant… - Advances in …, 2023 - proceedings.neurips.cc
Much of the previous work towards digital agents for graphical user interfaces (GUIs) has
relied on text-based representations (derived from HTML or other structured data sources) …

A generalist neural algorithmic learner

B Ibarz, V Kurin, G Papamakarios… - Learning on graphs …, 2022 - proceedings.mlr.press
The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks,
especially in a way that generalises out of distribution. While recent years have seen a surge …

Autonomous evaluation and refinement of digital agents

J Pan, Y Zhang, N Tomlin, Y Zhou, S Levine… - arXiv preprint arXiv …, 2024 - arxiv.org
We show that domain-general automatic evaluators can significantly improve the
performance of agents for web navigation and device control. We experiment with multiple …