Y Zhou, X Liu, X Zhang, Y Zhang - arXiv preprint arXiv:2501.12785, 2025 - arxiv.org
This paper tackles the efficiency and stability issues in learning from observations (LfO). We
commence by investigating how reward functions and policies generalize in LfO …