P Thomas - International conference on machine learning, 2014 - proceedings.mlr.press
We show that several popular discounted reward natural actor-critics, including the popular
NAC-LSTD and eNAC algorithms, do not generate unbiased estimates of the natural policy …