A Escontrela, XB Peng, W Yu, T Zhang, A Iscen… - arXiv preprint arXiv …, 2022 - arxiv.org
Training a high-dimensional simulated agent with an under-specified reward function often
leads the agent to learn physically infeasible strategies that are ineffective when deployed in …