of talking faces. Recent deep learning approaches to facial synthesis rely on extracting low-
dimensional representations and concatenating them, followed by a decoding step of the
concatenated vector. This accounts for only first-order interactions of the features and
ignores higher-order interactions. In this paper we propose a polynomial fusion layer that
models the joint representation of the encodings by a higher-order polynomial, with the …