P Chakravarty, J Zegers, T Tuytelaars - homes.esat.kuleuven.be
In this work, we show how to co-train a classifier for active speaker detection using audio-
visual data. First, audio Voice Activity Detection (VAD) is used to train a personalized video …