Learning in audio-visual context: A review, analysis, and new perspective

Y Wei, D Hu, Y Tian, X Li - arXiv preprint arXiv:2208.09579, 2022 - arxiv.org
Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …

Learning to set waypoints for audio-visual navigation

C Chen, S Majumder, Z Al-Halah, R Gao… - arXiv preprint arXiv …, 2020 - arxiv.org
In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D
environment using both sights and sounds to find a sound source (eg, a phone ringing in …

Few-shot audio-visual learning of environment acoustics

S Majumder, C Chen, Z Al-Halah… - Advances in Neural …, 2022 - proceedings.neurips.cc
Room impulse response (RIR) functions capture how the surrounding physical environment
transforms the sounds heard by a listener, with implications for various applications in AR …

VisualEchoes: Spatial Image Representation Learning Through Echolocation

R Gao, C Chen, Z Al-Halah, C Schissler… - Computer Vision–ECCV …, 2020 - Springer
Several animal species (eg, bats, dolphins, and whales) and even visually impaired humans
have the remarkable ability to perform echolocation: a biological sonar used to perceive …

Audio-visual floorplan reconstruction

S Purushwalkam, SVA Gari, VK Ithapu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Given only a few glimpses of an environment, how much can we infer about its entire
floorplan? Existing methods can map only what is visible or immediately apparent from …

Beyond image to depth: Improving depth prediction using echoes

KK Parida, S Srivastava… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
We address the problem of estimating depth with multi modal audio visual data. Inspired by
the ability of animals, such as bats and dolphins, to infer distance of objects with …

Dense 2D-3D Indoor Prediction with Sound via Aligned Cross-Modal Distillation

H Yun, J Na, G Kim - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Sound can convey significant information for spatial reasoning in our daily lives. To endow
deep networks with such ability, we address the challenge of dense indoor prediction with …

Beyond mono to binaural: Generating binaural audio from mono audio with depth and cross modal attention

KK Parida, S Srivastava… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Binaural audio gives the listener an immersive experience and can enhance augmented
and virtual reality. However, recording binaural audio requires specialized setup with a …

Geometry-aware multi-task learning for binaural audio generation from video

R Garg, R Gao, K Grauman - arXiv preprint arXiv:2111.10882, 2021 - arxiv.org
Binaural audio provides human listeners with an immersive spatial sound experience, but
most existing videos lack binaural audio recordings. We propose an audio spatialization …

SOUNDCAM: a dataset for finding humans using room acoustics

M Wang, S Clarke, JH Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc
A room's acoustic properties are a product of the room's geometry, the objects within the
room, and their specific positions. A room's acoustic properties can be characterized by its …