[HTML][HTML] A speech separation system in video sequence using dilated inception network and U-Net

G Dahy, MAA Refaey, R Alkhoribi, M Shoman - Egyptian Informatics Journal, 2022 - Elsevier
In this paper, an audio-visual model for separating a speech of the target speaker from a
combination of other speakers' speeches is proposed. It can be used in speech separation …

Audio visual speech source separation via improved context dependent association model

A Kazemi, R Boostani, F Sobhanmanesh - EURASIP Journal on Advances …, 2014 - Springer
In this paper, we exploit the non-linear relation between a speech source and its associated
lip video as a source of extra information to propose an improved audio-visual speech …

Speaker-Independent Audio-Visual Speech Separation Based on Transformer in Multi-Talker Environments

J Wang, Y Luo, W Yi, X Xie - IEICE TRANSACTIONS on Information …, 2022 - search.ieice.org
Speech separation is the task of extracting target speech while suppressing background
interference components. In applications like video telephones, visual information about the …

Time-domain audio-visual speech separation on low quality videos

Y Wu, C Li, J Bai, Z Wu, Y Qian - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Incorporating visual information is a promising approach to improve the performance of
speech separation. Many related works have been conducted and provide inspiring results …

Integration of deep learning with expectation maximization for spatial cue-based speech separation in reverberant conditions

S Gul, MS Khan, SW Shah - Applied Acoustics, 2021 - Elsevier
In this paper, we formulate a blind source separation (BSS) framework, which allows
integrating U-Net based deep learning source separation network with probabilistic spatial …

[PDF][PDF] Multi-Stream Gated and Pyramidal Temporal Convolutional Neural Networks for Audio-Visual Speech Separation in Multi-Talker Environments.

Y Luo, J Wang, L Xu, L Yang - Interspeech, 2021 - researchgate.net
Speech separation is the task of extracting target speech from noisy mixture. In applications
like video telephones or video conferencing, lip movements of the target speaker are …

Speech segregation in background noise based on deep learning

JB Awotunde, RO Ogundokun, FE Ayo… - IEEE Access, 2020 - ieeexplore.ieee.org
The most important way several people communicate is through speech. Speech is used to
convey other information such as speaker communication, emotion, and attitude. Therefore …

Multi-layer attention mechanism based speech separation model

M Li, T Lan, C Peng, Y Qian… - 2019 IEEE 19th …, 2019 - ieeexplore.ieee.org
Speech separation is the front-end of speech processing applications. Its purpose is to
separate the speech in a multi-speaker environment. The neural network methods show …

[PDF][PDF] Audio-visual speaker separation

F Khan - 2016 - ueaeprints.uea.ac.uk
Communication using speech is often an audio-visual experience. Listeners hear what is
being uttered by speakers and also see the corresponding facial movements and other …

Implementation of real-time speech separation model using time-domain audio separation network (TasNet) and dual-path recurrent neural network (DPRNN)

A Wijayakusuma, DR Gozali, A Widjaja… - Procedia Computer …, 2021 - Elsevier
The purpose of this research is to develop a model that is able to perform real-time speaker
independent multi-talker speech separation task in time-domain using Time-Domain Audio …