Voxblink: A large scale speaker verification dataset on camera

Y Lin, X Qin, G Zhao, M Cheng, N Jiang… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
In this paper, we introduce a large-scale and high-quality audiovisual speaker verification
dataset, named VoxBlink. We propose an innovative and robust automatic audio-visual data …

Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction

Z Pan, G Wichern, Y Masuyama… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Target speech extraction aims to extract, based on a given conditioning cue, a target speech
signal that is corrupted by interfering sources, such as noise or competing speakers …

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

S Wu, C Wang, H Chen, Y Dai, C Zhang… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Previous Multimodal Information based Speech Processing (MISP) challenges mainly
focused on audio-visual speech recognition (AVSR) with commendable success. However …

Context-Aware Audio-Visual Speech Enhancement Based on Neuro-Fuzzy Modelling and User Preference Learning

S Chen, J Kirton-Wingate, F Doctor… - … on Fuzzy Systems, 2024 - ieeexplore.ieee.org
It is estimated that by 2050 approximately one in ten individuals globally will experience
disabling hearing impairment. In the presence of everyday reverberant noise, a substantial …

Speech Enhancement: A Survey of Approaches and Applications

S Chhetri, MS Joshi, CV Mahamuni… - … Conference on Edge …, 2023 - ieeexplore.ieee.org
The paper provides a comprehensive overview of speech enhancement techniques and
their applications. It discusses challenges in non-stationary noise, reverberation, and …

AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow-and Cross-Band Modeling

VA Kalkhorani, C Yu, A Kumar, K Tan, B Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
Adding visual cues to audio-based speech separation can improve separation performance.
This paper introduces AV-CrossNet, an audiovisual (AV) system for speech enhancement …

AV2WAV: Diffusion-Based Re-Synthesis from Continuous Self-Supervised Features for Audio-Visual Speech Enhancement

JC Chou, CM Chien, K Livescu - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Speech enhancement systems are typically trained using pairs of clean and noisy speech. In
audio-visual speech enhancement (AVSE), there is not as much ground-truth clean data …

Efficient Intelligibility Evaluation Using Keyword Spotting: A Study on Audio-Visual Speech Enhancement

C Valentini-Botinhao, ALA Blanco… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We propose a new method for human speech intelligibility evaluation based on keyword
spotting. In this method, participants play a stimulus and select the word they hear from a …

Diffusion-based Unsupervised Audio-visual Speech Enhancement

JE Ayilo, M Sadeghi, R Serizel… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper proposes a new unsupervised audiovisual speech enhancement (AVSE)
approach that combines a diffusion-based audio-visual speech generative model with a non …

Socio-Technical Trust For Multi-Modal Hearing Assistive Technology

J Williams, T Azim, AM Piskopani… - … , Speech, and Signal …, 2023 - ieeexplore.ieee.org
The landscape of opportunity is rapidly changing for audio-visual (AV) hearing assistive
technology. While hearing assistive devices, such as hearing aids, have traditionally been …