Audio vision: Using audio-visual synchrony to locate sounds

H Zhu, MD Luo, R Wang, AH Zheng, R He - International Journal of …, 2021 - Springer

Audio-visual learning, aimed at exploiting the relationship between audio and visual
modalities, has drawn considerable attention since deep learning started to be used …

被引用次数：161 相关文章所有 12 个版本

[PDF] arxiv.org

Audio surveillance: A systematic review

M Crocco, M Cristani, A Trucco, V Murino - ACM Computing Surveys …, 2016 - dl.acm.org

Despite surveillance systems becoming increasingly ubiquitous in our living environment,
automated surveillance, currently based on video sensory modality and machine …

被引用次数：300 相关文章所有 6 个版本

[PDF] thecvf.com

Audio-visual scene analysis with self-supervised multisensory features

A Owens, AA Efros - Proceedings of the European …, 2018 - openaccess.thecvf.com

The thud of a bouncing ball, the onset of speech as lips open--when visual and audio events
occur together, it suggests that there might be a common, underlying event that produced …

被引用次数：832 相关文章所有 8 个版本

[PDF] arxiv.org

Visualvoice: Audio-visual speech separation with cross-modal consistency

R Gao, K Grauman - 2021 IEEE/CVF Conference on Computer …, 2021 - ieeexplore.ieee.org

We introduce a new approach for audio-visual speech separation. Given a video, the goal is
to extract the speech associated with a face in spite of simultaneous back-ground sounds …

被引用次数：164 相关文章所有 9 个版本

[PDF] thecvf.com

The sound of pixels

H Zhao, C Gan, A Rouditchenko… - Proceedings of the …, 2018 - openaccess.thecvf.com

We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos,
learns to locate image regions which produce sounds and separate the input sounds into a …

被引用次数：565 相关文章所有 10 个版本

[PDF] arxiv.org

Soundspaces: Audio-visual navigation in 3d environments

C Chen, U Jain, C Schissler, SVA Gari… - Computer Vision–ECCV …, 2020 - Springer

Moving around in the world is naturally a multisensory experience, but today's embodied
agents are deaf—restricted to solely their visual perception of the environment. We introduce …

被引用次数：256 相关文章所有 6 个版本

[PDF] thecvf.com

Audio-visual event localization in unconstrained videos

Y Tian, J Shi, B Li, Z Duan, C Xu - Proceedings of the …, 2018 - openaccess.thecvf.com

In this paper, we introduce a novel problem of audio-visual event localization in
unconstrained videos. We define an audio-visual event as an event that is both visible and …

被引用次数：451 相关文章所有 11 个版本

[PDF] neurips.cc

A closer look at weakly-supervised audio-visual source localization

S Mo, P Morgado - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Audio-visual source localization is a challenging task that aims to predict the location of
visual sound sources in a video. Since collecting ground-truth annotations of sounding …

被引用次数：46 相关文章所有 6 个版本

[PDF] thecvf.com

Music gesture for visual sound separation

C Gan, D Huang, H Zhao… - Proceedings of the …, 2020 - openaccess.thecvf.com

Recent deep learning approaches have achieved impressive performance on visual sound
separation tasks. However, these approaches are mostly built on appearance and optical …

被引用次数：205 相关文章所有 9 个版本

[PDF] thecvf.com

Learning to localize sound source in visual scenes

A Senocak, TH Oh, J Kim, MH Yang… - Proceedings of the …, 2018 - openaccess.thecvf.com

Visual events are usually accompanied by sounds in our daily lives. We pose the question:
Can the machine learn the correspondence between visual scene and the sound, and …

被引用次数：349 相关文章所有 9 个版本

高级搜索

QQ 群