A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

A metaverse: Taxonomy, components, applications, and open challenges

SM Park, YG Kim - IEEE access, 2022 - ieeexplore.ieee.org
Unlike previous studies on the Metaverse based on Second Life, the current Metaverse is
based on the social value of Generation Z that online and offline selves are not different …

End-to-end spectro-temporal graph attention networks for speaker verification anti-spoofing and speech deepfake detection

H Tak, J Jung, J Patino, M Kamble, M Todisco… - arXiv preprint arXiv …, 2021 - arxiv.org
Artefacts that serve to distinguish bona fide speech from spoofed or deepfake speech are
known to reside in specific subbands and temporal segments. Various approaches can be …

Implicit neural spatial filtering for multichannel source separation in the waveform domain

D Markovic, A Defossez, A Richard - arXiv preprint arXiv:2206.15423, 2022 - arxiv.org
We present a single-stage casual waveform-to-waveform multichannel model that can
separate moving sound sources based on their broad spatial locations in a dynamic …

Complex-valued spatial autoencoders for multichannel speech enhancement

MM Halimeh, W Kellermann - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
In this contribution, we present a novel online approach to multichannel speech
enhancement. The proposed method estimates the enhanced signal through a filter-and …

A novel approach to multi-channel speech enhancement based on graph neural networks

HN Chau, TD Bui, HB Nguyen… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Multi-channel speech enhancement aims at utilizing spatial relationships between signals
captured from a microphone array along with temporal-spectral information efficiently to …

Leveraging low-distortion target estimates for improved speech enhancement

ZQ Wang, G Wichern, JL Roux - arXiv preprint arXiv:2110.00570, 2021 - arxiv.org
A promising approach for multi-microphone speech separation involves two deep neural
networks (DNN), where the predicted target speech from the first DNN is used to compute …

Time-domain speech separation networks with graph encoding auxiliary

T Wang, Z Pan, M Ge, Z Yang… - IEEE Signal Processing …, 2023 - ieeexplore.ieee.org
End-to-end time-domain speech separation with masking strategy has shown its
performance advantage, where a 1-D convolutional layer is used as the speech encoder to …

Deep neural mel-subband beamformer for in-car speech separation

V Kothapally, Y Xu, M Yu, SX Zhang… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
While current deep learning (DL)-based beamforming techniques have been proved
effective in speech separation, they are often designed to process narrow-band (NB) …

Graph neural networks for sound source localization on distributed microphone networks

E Grinstein, M Brookes… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Distributed Microphone Arrays (DMAs) present many challenges with respect to centralized
microphone arrays. An important requirement of applications on these arrays is handling a …