decomposition problems. Empirically, such training process often first fits larger components
and then discovers smaller components, which is similar to a tensor deflation process that is
commonly used in tensor decomposition algorithms. We prove that for orthogonally
decomposable tensor, a slightly modified version of gradient flow would follow a tensor
deflation process and recover all the tensor components. Our proof suggests that for …