Z Chen,
ST Maguluri, S Shakkottai… - Advances in Neural …, 2021 - proceedings.neurips.cc
In TD-learning, off-policy sampling is known to be more practical than on-policy sampling,
and by decoupling learning from data collection, it enables data reuse. It is known that policy …