Neural analysis and synthesis: Reconstructing speech from self-supervised representations

HS Choi, J Lee, W Kim, J Lee… - Advances in Neural …, 2021 - proceedings.neurips.cc
We present a neural analysis and synthesis (NANSY) framework that can manipulate the
voice, pitch, and speed of an arbitrary speech signal. Most of the previous works have …

Deep learning-based non-intrusive multi-objective speech assessment model with cross-domain features

RE Zezario, SW Fu, F Chen, CS Fuh… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
This study proposes a cross-domain multi-objective speech assessment model, called
MOSA-Net, which can simultaneously estimate the speech quality, intelligibility, and …

MBNet: MOS prediction for synthesized speech with mean-bias network

Y Leng, X Tan, S Zhao, F Soong… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Mean opinion score (MOS) is a popular subjective metric to assess the quality of
synthesized speech, and usually involves multiple human judges to evaluate each speech …

Predictions of subjective ratings and spoofing assessments of voice conversion challenge 2020 submissions

RK Das, T Kinnunen, WC Huang, Z Ling… - arXiv preprint arXiv …, 2020 - arxiv.org
The Voice Conversion Challenge 2020 is the third edition under its flagship that promotes
intra-lingual semiparallel and cross-lingual voice conversion (VC). While the primary …

A review on subjective and objective evaluation of synthetic speech

E Cooper, WC Huang, Y Tsao, HM Wang… - Acoustical Science …, 2024 - jstage.jst.go.jp
Evaluating synthetic speech generated by machines is a complicated process, as it involves
judging along multiple dimensions including naturalness, intelligibility, and whether the …

SQuId: Measuring speech naturalness in many languages

T Sellam, A Bapna, J Camp… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Much of text-to-speech research relies on human evaluation. This incurs heavy costs and
slows down the development process, especially in heavily multilingual applications where …

Fusion of self-supervised learned models for MOS prediction

Z Yang, W Zhou, C Chu, S Li, R Dabre… - arXiv preprint arXiv …, 2022 - arxiv.org
We participated in the mean opinion score (MOS) prediction challenge, 2022. This
challenge aims to predict MOS scores of synthetic speech on two tracks, the main track and …

InQSS: a speech intelligibility and quality assessment model using a multi-task learning network

YW Chen, Y Tsao - arXiv preprint arXiv:2111.02585, 2021 - arxiv.org
Speech intelligibility and quality assessment models are essential tools for researchers to
evaluate and improve speech processing models. However, only a few studies have …

Non-Intrusive Speech Quality Assessment Based on Deep Neural Networks for Speech Communication

M Liu, J Wang, F Wang, F Xiang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Traditionally, speech quality evaluation relies on subjective assessments or intrusive
methods that require reference signals or additional equipment. However, over recent years …

Coded Speech Quality Measurement by a Non-Intrusive PESQ-DNN

Z Xu, Z Zhao, T Fingscheidt - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
Wideband codecs such as AMR-WB or EVS are widely used in (mobile) speech
communication. Evaluation of coded speech quality is often performed subjectively by an …