AVSE challenge: Audio-visual speech enhancement challenge

ALA Blanco, C Valentini-Botinhao… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Audio-visual speech enhancement is the task of improving the quality of a speech signal
when video of the speaker is available. It opens-up the opportunity of improving speech …

A machine speech chain approach for dynamically adaptive lombard tts in static and dynamic noise environments

S Novitasari, S Sakti… - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
Recent end-to-end text-to-speech synthesis (TTS) systems have successfully synthesized
high-quality speech. However, TTS speech intelligibility degrades in noisy environments …

MetricGAN-OKD: multi-metric optimization of MetricGAN via online knowledge distillation for speech enhancement

W Shin, BH Lee, JS Kim, HJ Park… - … on Machine Learning, 2023 - proceedings.mlr.press
In speech enhancement, MetricGAN-based approaches reduce the discrepancy between
the $ L_p $ loss and evaluation metrics by utilizing a non-differentiable evaluation metric as …

Acoustic sensing analytics applied to speech in reverberation conditions

P Odya, J Kotus, A Kurowski, B Kostek - Sensors, 2021 - mdpi.com
The paper aims to discuss a case study of sensing analytics and technology in acoustics
when applied to reverberation conditions. Reverberation is one of the issues that makes …

Multi-metric optimization using generative adversarial networks for near-end speech intelligibility enhancement

H Li, J Yamagishi - IEEE/ACM Transactions on Audio, Speech …, 2021 - ieeexplore.ieee.org
The intelligibility of speech severely degrades in the presence of environmental noise and
reverberation. In this paper, we propose a novel deep learning based system for modifying …

SaSLaW: Dialogue speech corpus with audio-visual egocentric information toward environment-adaptive dialogue speech synthesis

O Take, S Takamichi, K Seki, Y Bando… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper presents SaSLaW, a spontaneous dialogue speech corpus containing
synchronous recordings of what speakers speak, listen to, and watch. Humans consider the …

[PDF][PDF] A Sound Engineering Approach to Near End Listening Enhancement.

C Chermaz, S King - INTERSPEECH, 2020 - isca-archive.org
We present the beta version of ASE (the Automatic Sound Engineer), a NELE (Near End
Listening Enhancement) algorithm based on audio engineering knowledge. Generations of …

Increasing speech intelligibility and naturalness in noise based on concepts of modulation spectrum and modulation transfer function

T Ngo, R Kubo, M Akagi - Speech Communication, 2021 - Elsevier
This study focuses on identifying effective features for controlling speech to increase speech
intelligibility under adverse conditions. Previous approaches either cancel noise throughout …

Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise

T Raitio, P Petkov, J Li, M Shifas, A Davis… - arXiv preprint arXiv …, 2022 - arxiv.org
We present a neural text-to-speech (TTS) method that models natural vocal effort variation to
improve the intelligibility of synthetic speech in the presence of noise. The method consists …

The impact of speech type on listening effort and intelligibility for native and non-native listeners

O Simantiraki, AE Wagner, M Cooke - Frontiers in Neuroscience, 2023 - frontiersin.org
Listeners are routinely exposed to many different types of speech, including artificially-
enhanced and synthetic speech, styles which deviate to a greater or lesser extent from …