- 学术资源搜索

Spring: Studying papers and reasoning to play games

Y Wu, SY Min, S Prabhumoye, Y Bisk… - Advances in …, 2024 - proceedings.neurips.cc

Open-world survival games pose significant challenges for AI algorithms due to their multi-
tasking, deep exploration, and goal prioritization requirements. Despite reinforcement …

被引用次数：11 相关文章所有 5 个版本

[PDF] arxiv.org

AgentKit: Flow Engineering with Graphs, not Coding

Y Wu, Y Fan, SY Min, S Prabhumoye, S McAleer… - arXiv preprint arXiv …, 2024 - arxiv.org

We propose an intuitive LLM prompting framework (AgentKit) for multifunctional agents.
AgentKit offers a unified framework for explicitly constructing a complex" thought process" …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Uncertainty-driven Exploration Strategies for Online Grasp Learning

Y Shi, P Schillinger, M Gabriel, A Kuss… - arXiv preprint arXiv …, 2023 - arxiv.org

Existing grasp prediction approaches are mostly based on offline learning, while, ignored
the exploratory grasp learning during online adaptation to new picking scenarios, ie, unseen …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

A Jesson, C Lu, G Gupta, A Filos, JN Foerster… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper introduces an effective and practical step toward approximate Bayesian
inference in on-policy actor-critic deep reinforcement learning. This step manifests as three …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

The Generalization Gap in Offline Reinforcement Learning

I Mediratta, Q You, M Jiang, R Raileanu - arXiv preprint arXiv:2312.05742, 2023 - arxiv.org

Despite recent progress in offline learning, these methods are still trained and tested on the
same environment. In this paper, we compare the generalization abilities of widely used …

Bad habits: Policy confounding and out-of-trajectory generalization in RL

M Suau, MTJ Spaan, FA Oliehoek - arXiv preprint arXiv:2306.02419, 2023 - arxiv.org

Reinforcement learning agents may sometimes develop habits that are effective only when
specific policies are followed. After an initial exploration phase in which agents try out …

被引用次数：2 相关文章所有 8 个版本

[PDF] arxiv.org

Enhancing Agent Learning through World Dynamics Modeling

Z Sun, H Shi, MA Côté, G Berseth, X Yuan… - arXiv preprint arXiv …, 2024 - arxiv.org

While large language models (LLMs) have been increasingly deployed across tasks in
language understanding and interactive decision-making, their impressive performance is …

[PDF] arxiv.org

Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models

A Sicilia, H Kim, KR Chandu, M Alikhani… - arXiv preprint arXiv …, 2024 - arxiv.org

Effective interlocutors account for the uncertain goals, beliefs, and emotions of others. But
even the best human conversationalist cannot perfectly anticipate the trajectory of a …

A Study of Generalization in Offline Reinforcement Learning

I Mediratta, Q You, M Jiang… - NeurIPS 2023 Workshop …, 2023 - openreview.net

Despite the recent progress in offline reinforcement learning (RL) algorithms, agents are
usually trained and tested on the same environment. In this paper, we perform an in-depth …

被引用次数：1 相关文章

[PDF] arxiv.org

Improving Policy Optimization via -Retrain

L Marzari, C Liu, PL Donti, E Marchesini - arXiv preprint arXiv:2406.08315, 2024 - arxiv.org

We present $\varepsilon $-retrain, an exploration strategy designed to encourage a
behavioral preference while optimizing policies with monotonic improvement guarantees. To …

高级搜索

QQ 群

Spring: Studying papers and reasoning to play games

AgentKit: Flow Engineering with Graphs, not Coding

Uncertainty-driven Exploration Strategies for Online Grasp Learning

ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

The Generalization Gap in Offline Reinforcement Learning

Bad habits: Policy confounding and out-of-trajectory generalization in RL

Enhancing Agent Learning through World Dynamics Modeling

Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models

A Study of Generalization in Offline Reinforcement Learning

Improving Policy Optimization via -Retrain

引用