Emphatic algorithms for deep reinforcement learning

R Jiang, T Zahavy, Z Xu, A White… - International …, 2021 - proceedings.mlr.press
Off-policy learning allows us to learn about possible policies of behavior from experience
generated by a different behavior policy. Temporal difference (TD) learning algorithms can …

Emphatic Algorithms for Deep Reinforcement Learning

R Jiang, T Zahavy, Z Xu, A White, M Hessel… - arXiv e …, 2021 - ui.adsabs.harvard.edu
Off-policy learning allows us to learn about possible policies of behavior from experience
generated by a different behavior policy. Temporal difference (TD) learning algorithms can …

Emphatic Algorithms for Deep Reinforcement Learning

R Jiang, T Zahavy, Z Xu, A White… - International …, 2021 - proceedings.mlr.press
Off-policy learning allows us to learn about possible policies of behavior from experience
generated by a different behavior policy. Temporal difference (TD) learning algorithms can …

Emphatic Algorithms for Deep Reinforcement Learning

R Jiang, T Zahavy, Z Xu, A White, M Hessel… - arXiv preprint arXiv …, 2021 - arxiv.org
Off-policy learning allows us to learn about possible policies of behavior from experience
generated by a different behavior policy. Temporal difference (TD) learning algorithms can …