Z Wang, D Zhou, J Lui, W Sun - arXiv preprint arXiv:2408.08994, 2024 - arxiv.org
Learning a transition model via Maximum Likelihood Estimation (MLE) followed by planning inside the learned model is perhaps the most standard and simplest Model-based …
Policy Optimization (PO) methods are among the most popular Reinforcement Learning (RL) algorithms in practice. Recently, Sherman et al.[2023a] proposed a PO-based algorithm with …