Spring: Studying papers and reasoning to play games

Y Wu, SY Min, S Prabhumoye, Y Bisk… - Advances in …, 2024 - proceedings.neurips.cc
Open-world survival games pose significant challenges for AI algorithms due to their multi-
tasking, deep exploration, and goal prioritization requirements. Despite reinforcement …

AgentKit: Flow Engineering with Graphs, not Coding

Y Wu, Y Fan, SY Min, S Prabhumoye, S McAleer… - arXiv preprint arXiv …, 2024 - arxiv.org
We propose an intuitive LLM prompting framework (AgentKit) for multifunctional agents.
AgentKit offers a unified framework for explicitly constructing a complex" thought process" …

Uncertainty-driven Exploration Strategies for Online Grasp Learning

Y Shi, P Schillinger, M Gabriel, A Kuss… - arXiv preprint arXiv …, 2023 - arxiv.org
Existing grasp prediction approaches are mostly based on offline learning, while, ignored
the exploratory grasp learning during online adaptation to new picking scenarios, ie, unseen …

ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

A Jesson, C Lu, G Gupta, A Filos, JN Foerster… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper introduces an effective and practical step toward approximate Bayesian
inference in on-policy actor-critic deep reinforcement learning. This step manifests as three …

The Generalization Gap in Offline Reinforcement Learning

I Mediratta, Q You, M Jiang, R Raileanu - arXiv preprint arXiv:2312.05742, 2023 - arxiv.org
Despite recent progress in offline learning, these methods are still trained and tested on the
same environment. In this paper, we compare the generalization abilities of widely used …

Bad habits: Policy confounding and out-of-trajectory generalization in RL

M Suau, MTJ Spaan, FA Oliehoek - arXiv preprint arXiv:2306.02419, 2023 - arxiv.org
Reinforcement learning agents may sometimes develop habits that are effective only when
specific policies are followed. After an initial exploration phase in which agents try out …

Enhancing Agent Learning through World Dynamics Modeling

Z Sun, H Shi, MA Côté, G Berseth, X Yuan… - arXiv preprint arXiv …, 2024 - arxiv.org
While large language models (LLMs) have been increasingly deployed across tasks in
language understanding and interactive decision-making, their impressive performance is …

Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models

A Sicilia, H Kim, KR Chandu, M Alikhani… - arXiv preprint arXiv …, 2024 - arxiv.org
Effective interlocutors account for the uncertain goals, beliefs, and emotions of others. But
even the best human conversationalist cannot perfectly anticipate the trajectory of a …

A Study of Generalization in Offline Reinforcement Learning

I Mediratta, Q You, M Jiang… - NeurIPS 2023 Workshop …, 2023 - openreview.net
Despite the recent progress in offline reinforcement learning (RL) algorithms, agents are
usually trained and tested on the same environment. In this paper, we perform an in-depth …

Improving Policy Optimization via -Retrain

L Marzari, C Liu, PL Donti, E Marchesini - arXiv preprint arXiv:2406.08315, 2024 - arxiv.org
We present $\varepsilon $-retrain, an exploration strategy designed to encourage a
behavioral preference while optimizing policies with monotonic improvement guarantees. To …