作者
Aaron Chau, Kouhei Sekiguchi, Aditya Arie Nugraha, Kazuyoshi Yoshii, Kotaro Funakoshi
发表日期
2019/10/14
研讨会论文
2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)
页码范围
1-8
出版商
IEEE
简介
We propose a novel audio-visual simultaneous and localization (SLAM) framework that exploits human pose and acoustic speech of human sound sources to allow a robot equipped with a microphone array and a monocular camera to track, map, and interact with human partners in an indoor environment. Since human interaction is characterized by features perceived in not only the visual modality, but the acoustic modality as well, SLAM systems must utilize information from both modalities. Using a state-of-the-art beamforming technique, we obtain sound components correspondent to speech and noise; and estimate the Direction-of-Arrival (DoA) estimates of active sound sources as useful representations of observed features in the acoustic modality. Through estimated human pose by a monocular camera, we obtain the relative positions of humans as representation of observed features in the visual modality …
引用总数
202020212022202320241142
学术搜索中的文章
A Chau, K Sekiguchi, AA Nugraha, K Yoshii… - 2019 28th IEEE International Conference on Robot and …, 2019