[HTML][HTML] Deep audio-visual learning: A survey

H Zhu, MD Luo, R Wang, AH Zheng, R He - International Journal of …, 2021 - Springer
Audio-visual learning, aimed at exploiting the relationship between audio and visual
modalities, has drawn considerable attention since deep learning started to be used …

Audio surveillance: A systematic review

M Crocco, M Cristani, A Trucco, V Murino - ACM Computing Surveys …, 2016 - dl.acm.org
Despite surveillance systems becoming increasingly ubiquitous in our living environment,
automated surveillance, currently based on video sensory modality and machine …

Audio-visual scene analysis with self-supervised multisensory features

A Owens, AA Efros - Proceedings of the European …, 2018 - openaccess.thecvf.com
The thud of a bouncing ball, the onset of speech as lips open--when visual and audio events
occur together, it suggests that there might be a common, underlying event that produced …

Visualvoice: Audio-visual speech separation with cross-modal consistency

R Gao, K Grauman - 2021 IEEE/CVF Conference on Computer …, 2021 - ieeexplore.ieee.org
We introduce a new approach for audio-visual speech separation. Given a video, the goal is
to extract the speech associated with a face in spite of simultaneous back-ground sounds …

The sound of pixels

H Zhao, C Gan, A Rouditchenko… - Proceedings of the …, 2018 - openaccess.thecvf.com
We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos,
learns to locate image regions which produce sounds and separate the input sounds into a …

Soundspaces: Audio-visual navigation in 3d environments

C Chen, U Jain, C Schissler, SVA Gari… - Computer Vision–ECCV …, 2020 - Springer
Moving around in the world is naturally a multisensory experience, but today's embodied
agents are deaf—restricted to solely their visual perception of the environment. We introduce …

Audio-visual event localization in unconstrained videos

Y Tian, J Shi, B Li, Z Duan, C Xu - Proceedings of the …, 2018 - openaccess.thecvf.com
In this paper, we introduce a novel problem of audio-visual event localization in
unconstrained videos. We define an audio-visual event as an event that is both visible and …

A closer look at weakly-supervised audio-visual source localization

S Mo, P Morgado - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Audio-visual source localization is a challenging task that aims to predict the location of
visual sound sources in a video. Since collecting ground-truth annotations of sounding …

Music gesture for visual sound separation

C Gan, D Huang, H Zhao… - Proceedings of the …, 2020 - openaccess.thecvf.com
Recent deep learning approaches have achieved impressive performance on visual sound
separation tasks. However, these approaches are mostly built on appearance and optical …

Learning to localize sound source in visual scenes

A Senocak, TH Oh, J Kim, MH Yang… - Proceedings of the …, 2018 - openaccess.thecvf.com
Visual events are usually accompanied by sounds in our daily lives. We pose the question:
Can the machine learn the correspondence between visual scene and the sound, and …