Existing Deep Reinforcement Learning (DRL) algorithms suffer from sample inefficiency. Generally, episodic control-based approaches are solutions that leverage highly-rewarded …
G Wang, F Wu, X Zhang, T Chen - arXiv preprint arXiv:2401.00162, 2023 - arxiv.org
The sparsity of reward feedback remains a challenging problem in online deep reinforcement learning (DRL). Previous approaches have utilized temporal credit …
One of the long-term goals of reinforcement learning is to build intelligent agents capable of rapidly learning and flexibly transferring skills, similar to humans and animals. In this paper …
Non-parametric episodic memory can be used to quickly latch onto high-rewarded experience in reinforcement learning tasks. In contrast to parametric deep reinforcement …
We introduce a novel training procedure for policy gradient methods wherein episodic memory is used to optimize the hyperparameters of reinforcement learning algorithms on …
We introduce a constrained optimization method for policy gradient reinforcement learning, which uses two trust regions to regulate each policy update. In addition to using the …
What motivates agents to explore? Successfully answering this question would enable agents to learn efficiently in formidable tasks. Random explorations such as 𝜖-greedy are …
Prompt optimization is essential for enhancing the performance of Large Language Models (LLMs) in a range of Natural Language Processing (NLP) tasks, particularly in scenarios of …
Episodic memory is often illustrated with the madeleine de Proust excerpt as the ability to re- experience a situation from the past following the perception of a stimulus. This simplistic …