Mural: Meta-learning uncertainty-aware rewards for outcome-driven reinforcement learning

K Li, A Gupta, A Reddy, VH Pong… - International …, 2021 - proceedings.mlr.press
Exploration in reinforcement learning is, in general, a challenging problem. A common
technique to make learning easier is providing demonstrations from a human supervisor, but …

MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning

K Li, A Gupta, A Reddy, V Pong, A Zhou, J Yu… - arXiv preprint arXiv …, 2021 - arxiv.org
Exploration in reinforcement learning is a challenging problem: in the worst case, the agent
must search for high-reward states that could be hidden anywhere in the state space. Can …

[PDF][PDF] MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning

K Li, A Gupta, A Reddy, V Pong, A Zhou, J Yu, S Levine - academia.edu
Exploration in reinforcement learning is a challenging problem: in the worst case, the agent
must search for high-reward states that could be hidden anywhere in the state space. Can …

MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning

K Li, A Gupta, A Reddy, V Pong, A Zhou, J Yu, S Levine - icml.cc
MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement
Learning Page 1 MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven …

MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning

K Li, A Gupta, A Reddy, V Pong, A Zhou, J Yu… - arXiv e …, 2021 - ui.adsabs.harvard.edu
Exploration in reinforcement learning is a challenging problem: in the worst case, the agent
must search for high-reward states that could be hidden anywhere in the state space. Can …

MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning

K Li, A Gupta, A Reddy, VH Pong… - International …, 2021 - proceedings.mlr.press
Exploration in reinforcement learning is, in general, a challenging problem. A common
technique to make learning easier is providing demonstrations from a human supervisor, but …