Ego4d: Around the world in 3,000 hours of egocentric video

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022 - openaccess.thecvf.com
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …

Robust self-supervised audio-visual speech recognition

B Shi, WN Hsu, A Mohamed - arXiv preprint arXiv:2201.01763, 2022 - arxiv.org
Audio-based automatic speech recognition (ASR) degrades significantly in noisy
environments and is particularly vulnerable to interfering speech, as the model cannot …

Egocentric audio-visual object localization

C Huang, Y Tian, A Kumar… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Humans naturally perceive surrounding scenes by unifying sound and sight in a first-person
view. Likewise, machines are advanced to approach human intelligence by learning with …

[HTML][HTML] An outlook into the future of egocentric vision

C Plizzari, G Goletto, A Furnari, S Bansal… - International Journal of …, 2024 - Springer
What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …

Egocentric deep multi-channel audio-visual active speaker localization

H Jiang, C Murdock, VK Ithapu - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Augmented reality devices have the potential to enhance human perception and enable
other assistive functionalities in complex conversational environments. Effectively capturing …

Parametric ambisonic encoding of arbitrary microphone arrays

L McCormack, A Politis, R Gonzalez… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
This article proposes a parametric signal-dependent method for the task of encoding
microphone array signals into Ambisonic signals. The proposed method is presented and …

Egocentric auditory attention localization in conversations

F Ryan, H Jiang, A Shukla… - Proceedings of the …, 2023 - openaccess.thecvf.com
In a noisy conversation environment such as a dinner party, people often exhibit selective
auditory attention, or the ability to focus on a particular speaker while tuning out others …

Revise: Self-supervised speech resynthesis with visual input for universal and generalized speech regeneration

WN Hsu, T Remez, B Shi… - Proceedings of the …, 2023 - openaccess.thecvf.com
Prior works on improving speech quality with visual input typically study each type of
auditory distortion separately (eg, separation, inpainting, video-to-speech) and present …

Sound source selection based on head movements in natural group conversation

H Lu, WO Brimijoin - Trends in Hearing, 2022 - journals.sagepub.com
To optimally improve signal-to-noise ratio in noisy environments, a hearing assistance
device must correctly identify what is signal and what is noise. Many of the biosignal-based …

An introduction to the speech enhancement for augmented reality (spear) challenge

P Guiraud, S Hafezi, PA Naylor… - … on Acoustic Signal …, 2022 - ieeexplore.ieee.org
It is well known that microphone arrays can be used to enhance a target speaker in a noisy,
reverberant environment, with both spatial (eg beamforming) and statistical (eg source …