MetaCURL: Non-stationary Concave Utility Reinforcement Learning

BM Moreno, M Brégère, P Gaillard… - arXiv preprint arXiv …, 2024 - arxiv.org
We explore online learning in episodic loop-free Markov decision processes on non-
stationary environments (changing losses and probability transitions). Our focus is on the …