Model-based episodic memory induces dynamic hybrid controls

D Nguyen, P Nguyen, H Le, K Do… - Proceedings of the …, 2023 - ojs.aaai.org

Social reasoning necessitates the capacity of theory of mind (ToM), the ability to
contextualise and attribute mental states to others without having access to their internal …

被引用次数：10 相关文章所有 7 个版本

[PDF] arxiv.org

Neural episodic control with state abstraction

Z Li, D Zhu, Y Hu, X Xie, L Ma, Y Zheng, Y Song… - arXiv preprint arXiv …, 2023 - arxiv.org

Existing Deep Reinforcement Learning (DRL) algorithms suffer from sample inefficiency.
Generally, episodic control-based approaches are solutions that leverage highly-rewarded …

被引用次数：10 相关文章所有 5 个版本

[PDF] arxiv.org

Policy Optimization with Smooth Guidance Rewards Learned from Sparse-Reward Demonstrations

G Wang, F Wu, X Zhang, T Chen - arXiv preprint arXiv:2401.00162, 2023 - arxiv.org

The sparsity of reward feedback remains a challenging problem in online deep
reinforcement learning (DRL). Previous approaches have utilized temporal credit …

被引用次数：1 相关文章所有 2 个版本

[PDF] nature.com

Temporally extended successor feature neural episodic control

X Zhu - Scientific Reports, 2024 - nature.com

One of the long-term goals of reinforcement learning is to build intelligent agents capable of
rapidly learning and flexibly transferring skills, similar to humans and animals. In this paper …

Continuous episodic control

Z Yang, TM Moerland, M Preuss… - 2023 IEEE Conference …, 2023 - ieeexplore.ieee.org

Non-parametric episodic memory can be used to quickly latch onto high-rewarded
experience in reinforcement learning tasks. In contrast to parametric deep reinforcement …

被引用次数：3 相关文章所有 4 个版本

[PDF] aaai.org

Episodic policy gradient training

H Le, M Abdolshah, TK George, K Do… - Proceedings of the …, 2022 - ojs.aaai.org

We introduce a novel training procedure for policy gradient methods wherein episodic
memory is used to optimize the hyperparameters of reinforcement learning algorithms on …

被引用次数：5 相关文章所有 10 个版本

[PDF] neurips.cc

Learning to constrain policy optimization with virtual trust region

TH Le, T Karimpanal George… - Advances in …, 2022 - proceedings.neurips.cc

We introduce a constrained optimization method for policy gradient reinforcement learning,
which uses two trust regions to regulate each policy update. In addition to using the …

被引用次数：3 相关文章所有 8 个版本

[PDF] github.io

[PDF][PDF] Beyond Surprise: Improving Exploration Through Surprise Novelty.

H Le, K Do, D Nguyen, S Venkatesh - AAMAS, 2024 - thaihungle.github.io

What motivates agents to explore? Successfully answering this question would enable
agents to learn efficiently in formidable tasks. Random explorations such as 𝜖-greedy are …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Large Language Models Prompting With Episodic Memory

D Do, Q Tran, S Venkatesh, H Le - arXiv preprint arXiv:2408.07465, 2024 - arxiv.org

Prompt optimization is essential for enhancing the performance of Large Language Models
(LLMs) in a range of Natural Language Processing (NLP) tasks, particularly in scenarios of …

Computational modeling of the interactions between episodic memory and cognitive control

H Chateau-Laurent - 2024 - theses.hal.science

Episodic memory is often illustrated with the madeleine de Proust excerpt as the ability to re-
experience a situation from the past following the perception of a stimulus. This simplistic …

被引用次数：1 相关文章所有 2 个版本

高级搜索

QQ 群