Webarena: A realistic web environment for building autonomous agents

S Zhou, FF Xu, H Zhu, X Zhou, R Lo, A Sridhar… - arXiv preprint arXiv …, 2023 - arxiv.org
With generative AI advances, the exciting potential for autonomous agents to manage daily
tasks via natural language commands has emerged. However, cur rent agents are primarily …

Lever: Learning to verify language-to-code generation with execution

A Ni, S Iyer, D Radev, V Stoyanov… - International …, 2023 - proceedings.mlr.press
The advent of large language models trained on code (code LLMs) has led to significant
progress in language-to-code generation. State-of-the-art approaches in this area combine …

Language models of code are few-shot commonsense learners

A Madaan, S Zhou, U Alon, Y Yang… - arXiv preprint arXiv …, 2022 - arxiv.org
We address the general task of structured commonsense reasoning: given a natural
language input, the goal is to generate a graph such as an event--or a reasoning-graph. To …

L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models

A Ni, P Yin, Y Zhao, M Riddell, T Feng… - Transactions of the …, 2024 - direct.mit.edu
Recently, large language models (LLMs), especially those that are pretrained on code, have
demonstrated strong capabilities in generating programs from natural language inputs …

Synatra: Turning indirect knowledge into direct demonstrations for digital agents at scale

T Ou, FF Xu, A Madaan, J Liu, R Lo, A Sridhar… - arXiv preprint arXiv …, 2024 - arxiv.org
LLMs can now act as autonomous agents that interact with digital environments and
complete specific objectives (eg, arranging an online meeting). However, accuracy is still far …

Step: Stacked llm policies for web actions

P Sodhi, SRK Branavan, Y Artzi… - First Conference on …, 2024 - openreview.net
Performing tasks on the web presents fundamental challenges to large language models
(LLMs), including combinatorially large open-world tasks and variations across web …

Multi-level compositional reasoning for interactive instruction following

S Bhambri, B Kim, J Choi - Proceedings of the AAAI Conference on …, 2023 - ojs.aaai.org
Robotic agents performing domestic chores by natural language directives are required to
master the complex job of navigating environment and interacting with objects in the …

Heap: Hierarchical policies for web actions using llms

P Sodhi, SRK Branavan, R McDonald - arXiv preprint arXiv:2310.03720, 2023 - arxiv.org
Large language models (LLMs) have demonstrated remarkable capabilities in performing a
range of instruction following tasks in few and zero-shot settings. However, teaching LLMs to …

Exploring the Capacity of Pretrained Language Models for Reasoning about Actions and Change

W He, C Huang, Z Xiao, Y Liu - … of the 61st Annual Meeting of the …, 2023 - aclanthology.org
Abstract Reasoning about actions and change (RAC) is essential to understand and interact
with the ever-changing environment. Previous AI research has shown the importance of …

SGL: Symbolic Goal Learning in a Hybrid, Modular Framework for Human Instruction Following

R Xu, H Chen, Y Lin, PA Vela - IEEE Robotics and Automation …, 2022 - ieeexplore.ieee.org
This paper investigates human instruction following for robotic manipulation via a hybrid,
modular system with symbolic and connectionist elements. Symbolic methods build modular …