A data-driven approach for learning to control computers

G Kim, P Baldi, S McAleer - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Agents capable of carrying out general tasks on a computer can improve efficiency and
productivity by automating repetitive tasks and assisting in complex problem-solving. Ideally …

被引用次数：304 相关文章所有 6 个版本

[PDF] arxiv.org

Augmented language models: a survey

G Mialon, R Dessì, M Lomeli, C Nalmpantis… - arXiv preprint arXiv …, 2023 - arxiv.org

This survey reviews works in which language models (LMs) are augmented with reasoning
skills and the ability to use tools. The former is defined as decomposing a potentially …

被引用次数：502 相关文章所有 3 个版本

[PDF] arxiv.org

Do as i can, not as i say: Grounding language in robotic affordances

M Ahn, A Brohan, N Brown, Y Chebotar… - arXiv preprint arXiv …, 2022 - arxiv.org

Large language models can encode a wealth of semantic knowledge about the world. Such
knowledge could be extremely useful to robots aiming to act upon high-level, temporally …

被引用次数：1469 相关文章所有 2 个版本

[PDF] neurips.cc

Video pretraining (vpt): Learning to act by watching unlabeled online videos

B Baker, I Akkaya, P Zhokov… - Advances in …, 2022 - proceedings.neurips.cc

Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for
training models with broad, general capabilities for text, images, and other modalities …

被引用次数：284 相关文章所有 6 个版本

[PDF] neurips.cc

Androidinthewild: A large-scale dataset for android device control

C Rawles, A Li, D Rodriguez… - Advances in Neural …, 2024 - proceedings.neurips.cc

There is a growing interest in device-control systems that can interpret human natural
language instructions and execute them on a digital device by directly controlling its user …

被引用次数：131 相关文章所有 8 个版本

[PDF] neurips.cc

Webshop: Towards scalable real-world web interaction with grounded language agents

S Yao, H Chen, J Yang… - Advances in Neural …, 2022 - proceedings.neurips.cc

Most existing benchmarks for grounding language in interactive environments either lack
realistic linguistic elements, or prove difficult to scale up due to substantial human …

被引用次数：334 相关文章所有 7 个版本

[PDF] arxiv.org

A real-world webagent with planning, long context understanding, and program synthesis

I Gur, H Furuta, A Huang, M Safdari, Y Matsuo… - arXiv preprint arXiv …, 2023 - arxiv.org

Pre-trained large language models (LLMs) have recently achieved better generalization and
sample efficiency in autonomous web navigation. However, the performance on real-world …

被引用次数：173 相关文章所有 4 个版本

[PDF] neurips.cc

From pixels to ui actions: Learning to follow instructions via graphical user interfaces

P Shaw, M Joshi, J Cohan, J Berant… - Advances in …, 2023 - proceedings.neurips.cc

Much of the previous work towards digital agents for graphical user interfaces (GUIs) has
relied on text-based representations (derived from HTML or other structured data sources) …

被引用次数：66 相关文章所有 5 个版本

[PDF] mlr.press

A generalist neural algorithmic learner

B Ibarz, V Kurin, G Papamakarios… - Learning on graphs …, 2022 - proceedings.mlr.press

The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks,
especially in a way that generalises out of distribution. While recent years have seen a surge …

被引用次数：69 相关文章所有 5 个版本

[PDF] arxiv.org

Autonomous evaluation and refinement of digital agents

J Pan, Y Zhang, N Tomlin, Y Zhou, S Levine… - arXiv preprint arXiv …, 2024 - arxiv.org

We show that domain-general automatic evaluators can significantly improve the
performance of agents for web navigation and device control. We experiment with multiple …

被引用次数：35 相关文章所有 2 个版本

高级搜索

QQ 群