Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

P Ochieng - Artificial Intelligence Review, 2023 - Springer
Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …

Remixit: Continual self-training of speech enhancement models via bootstrapped remixing

E Tzinis, Y Adi, VK Ithapu, B Xu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
We present RemixIT, a simple yet effective self-supervised method for training speech
enhancement without the need of a single isolated in-domain speech nor a noise waveform …

Unsupervised speech enhancement with speech recognition embedding and disentanglement losses

VA Trinh, S Braun - ICASSP 2022-2022 IEEE International …, 2022 - ieeexplore.ieee.org
Speech enhancement has recently achieved great success with various deep learning
methods. However, most conventional speech enhancement systems are trained with …

Diffusion-based speech enhancement with joint generative and predictive decoders

H Shi, K Shimada, M Hirano, T Shibuya… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Diffusion-based generative speech enhancement (SE) has recently received attention, but
reverse diffusion remains time-consuming. One solution is to initialize the reverse diffusion …

Continual self-training with bootstrapped remixing for speech enhancement

E Tzinis, Y Adi, VK Ithapu, B Xu… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
We propose RemixIT, a simple and novel self-supervised training method for speech
enhancement. The proposed method is based on a continuously self-training scheme that …

Efficient personalized speech enhancement through self-supervised learning

A Sivaraman, M Kim - IEEE Journal of Selected Topics in Signal …, 2022 - ieeexplore.ieee.org
This work presents self-supervised learning methods for monaural speaker-specific (ie,
personalized) speech enhancement models. While general-purpose models must broadly …

Large-scale unsupervised audio pre-training for video-to-speech synthesis

T Kefalas, Y Panagakis, M Pantic - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Video-to-speech synthesis is the task of reconstructing the speech signal from a silent video
of a speaker. Previous approaches train on data from almost exclusively audio-visual …

Power Normalized Gammachirp Cepstral (PNGC) coefficients-based approach for robust speaker recognition

Y Zouhir, M Zarka, K Ouni - Applied Acoustics, 2023 - Elsevier
Speaker identification or recognition task aims to identify persons from their voices. This
paper introduces a new feature extraction approach for robust speaker recognition named …

Self-supervised speech denoising using only noisy audio signals

J Wu, Q Li, G Yang, L Li, L Senhadji, H Shu - Speech Communication, 2023 - Elsevier
In traditional speech denoising tasks, clean audio signals are often used as the training
target, but absolutely clean signals are collected from expensive recording equipment or in …

[HTML][HTML] Using Deep Learning to Classify Environmental Sounds in the Habitat of Western Black-Crested Gibbons

R Hu, K Hu, L Wang, Z Guan, X Zhou, N Wang, L Ye - Diversity, 2024 - mdpi.com
The western black-crested gibbon (Nomascus concolor) is a rare and endangered primate
that inhabits southern China and northern Vietnam, and has become a key conservation …