AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org

Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …

被引用次数：418 相关文章所有 8 个版本

[PDF] core.ac.uk

[PDF][PDF] Speech synthesis evaluation—state-of-the-art assessment and suggestion for a novel research program

P Wagner, J Beskow, S Betz, J Edlund… - Proceedings of the 10th …, 2019 - core.ac.uk

Speech synthesis applications have become an ubiquity, in navigation systems, digital
assistants or as screen or audio book readers. Despite their impact on the acceptability of …

被引用次数：104 相关文章所有 11 个版本

[PDF] arxiv.org

The voicemos challenge 2022

WC Huang, E Cooper, Y Tsao, HM Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

We present the first edition of the VoiceMOS Challenge, a scientific event that aims to
promote the study of automatic prediction of the mean opinion score (MOS) of synthetic …

被引用次数：128 相关文章所有 9 个版本

[PDF] arxiv.org

Mosnet: Deep learning based objective assessment for voice conversion

CC Lo, SW Fu, WC Huang, X Wang… - arXiv preprint arXiv …, 2019 - arxiv.org

Existing objective evaluation metrics for voice conversion (VC) are not always correlated
with human perception. Therefore, training VC models with such criteria may not effectively …

被引用次数：327 相关文章所有 14 个版本

[PDF] arxiv.org

Utmos: Utokyo-sarulab system for voicemos challenge 2022

T Saeki, D Xin, W Nakata, T Koriyama… - arXiv preprint arXiv …, 2022 - arxiv.org

We present the UTokyo-SaruLab mean opinion score (MOS) prediction system submitted to
VoiceMOS Challenge 2022. The challenge is to predict the MOS values of speech samples …

被引用次数：174 相关文章所有 10 个版本

[PDF] arxiv.org

MBNet: MOS prediction for synthesized speech with mean-bias network

Y Leng, X Tan, S Zhao, F Soong… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Mean opinion score (MOS) is a popular subjective metric to assess the quality of
synthesized speech, and usually involves multiple human judges to evaluate each speech …

被引用次数：110 相关文章所有 4 个版本

[PDF] arxiv.org

CDPAM: Contrastive learning for perceptual audio similarity

P Manocha, Z Jin, R Zhang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Many speech processing methods based on deep learning require an automatic and
differentiable audio metric for the loss function. The DPAM approach of Manocha et al.[1] …

被引用次数：83 相关文章所有 5 个版本

[PDF] arxiv.org

Deep learning based assessment of synthetic speech naturalness

G Mittag, S Möller - arXiv preprint arXiv:2104.11673, 2021 - arxiv.org

In this paper, we present a new objective prediction model for synthetic speech naturalness.
It can be used to evaluate Text-To-Speech or Voice Conversion systems and works …

被引用次数：82 相关文章所有 8 个版本

[PDF] arxiv.org

Utilizing self-supervised representations for MOS prediction

WC Tseng, C Huang, WT Kao, YY Lin, H Lee - arXiv preprint arXiv …, 2021 - arxiv.org

Speech quality assessment has been a critical issue in speech processing for decades.
Existing automatic evaluations usually require clean references or parallel ground truth data …

被引用次数：66 相关文章所有 7 个版本

[PDF] neurips.cc

NORESQA: A framework for speech quality assessment using non-matching references

P Manocha, B Xu, A Kumar - Advances in neural …, 2021 - proceedings.neurips.cc

The perceptual task of speech quality assessment (SQA) is a challenging task for machines
to do. Objective SQA methods that rely on the availability of the corresponding clean …

被引用次数：48 相关文章所有 8 个版本

高级搜索

QQ 群