R Arandjelović, A Zisserman - … , Munich, Germany, September 8-14, 2018 …, 2018 - Springer
In this paper our objectives are, first, networks that can embed audio and visual inputs into a
common space that is suitable for cross-modal retrieval; and second, a network that can …