Audio-visual floorplan reconstruction

C Chen, C Schissler, S Garg… - Advances in …, 2022 - proceedings.neurips.cc

Abstract We introduce SoundSpaces 2.0, a platform for on-the-fly geometry-based audio
rendering for 3D environments. Given a 3D mesh of a real-world environment …

被引用次数：58 相关文章所有 8 个版本

[PDF] thecvf.com

Semantic audio-visual navigation

C Chen, Z Al-Halah, K Grauman - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Recent work on audio-visual navigation assumes a constantly-sounding target and restricts
the role of audio to signaling the target's position. We introduce semantic audio-visual …

被引用次数：104 相关文章所有 14 个版本

[PDF] thecvf.com

Toward practical monocular indoor depth estimation

CY Wu, J Wang, M Hall… - Proceedings of the …, 2022 - openaccess.thecvf.com

The majority of prior monocular depth estimation methods without groundtruth depth
guidance focus on driving scenarios. We show that such methods generalize poorly to …

被引用次数：47 相关文章所有 5 个版本

[PDF] neurips.cc

Few-shot audio-visual learning of environment acoustics

S Majumder, C Chen, Z Al-Halah… - Advances in Neural …, 2022 - proceedings.neurips.cc

Room impulse response (RIR) functions capture how the surrounding physical environment
transforms the sounds heard by a listener, with implications for various applications in AR …

被引用次数：34 相关文章所有 7 个版本

[PDF] thecvf.com

Pathdreamer: A world model for indoor navigation

JY Koh, H Lee, Y Yang, J Baldridge… - Proceedings of the …, 2021 - openaccess.thecvf.com

People navigating in unfamiliar buildings take advantage of myriad visual, spatial and
semantic cues to efficiently achieve their navigation goals. Towards equipping …

被引用次数：63 相关文章所有 10 个版本

[PDF] thecvf.com

Move2hear: Active audio-visual source separation

S Majumder, Z Al-Halah… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

We introduce the active audio-visual source separation problem, where an agent must move
intelligently in order to better isolate the sounds coming from an object of interest in its …

被引用次数：44 相关文章所有 10 个版本

[PDF] sciencedirect.com

Context understanding in computer vision: A survey

X Wang, Z Zhu - Computer Vision and Image Understanding, 2023 - Elsevier

Contextual information plays an important role in many computer vision tasks, such as object
detection, video action detection, image classification, etc. Recognizing a single object or …

被引用次数：22 相关文章所有 6 个版本

[PDF] arxiv.org

Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models

P Chen, X Sun, H Zhi, R Zeng, TH Li, G Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

We study the task of zero-shot vision-and-language navigation (ZS-VLN), a practical yet
challenging problem in which an agent learns to navigate following a path described by …

被引用次数：9 相关文章所有 3 个版本

[PDF] thecvf.com

Listening human behavior: 3d human pose estimation with acoustic signals

Y Shibata, Y Kawashima, M Isogawa… - Proceedings of the …, 2023 - openaccess.thecvf.com

Given only acoustic signals without any high-level information, such as voices or sounds of
scenes/actions, how much can we infer about the behavior of humans? Unlike existing …

被引用次数：8 相关文章所有 6 个版本

[PDF] neurips.cc

Disentangled counterfactual learning for physical audiovisual commonsense reasoning

C Lv, S Zhang, Y Tian, M Qi… - Advances in Neural …, 2024 - proceedings.neurips.cc

In this paper, we propose a Disentangled Counterfactual Learning (DCL) approach for
physical audiovisual commonsense reasoning. The task aims to infer objects' physics …

被引用次数：2 相关文章所有 5 个版本

高级搜索

QQ 群