查看文章

duke.edu 中的 [PDF]

Text-independent voice conversion using deep neural network based phonetic level features

作者

Huadi Zheng, Weicheng Cai, Tianyan Zhou, Shilei Zhang, Ming Li

发表日期

2016/12/4

研讨会论文

2016 23rd International Conference on Pattern Recognition (ICPR)

页码范围

2872-2877

出版商

IEEE

简介

This paper presents a phonetically-aware joint density Gaussian mixture model (JD-GMM) framework for voice conversion that no longer requires parallel data from source speaker at the training stage. Considering that the phonetic level features contain text information which should be preserved in the conversion task, we propose a method that only concatenates phonetic discriminant features and spectral features extracted from the same target speakers speech to train a JD-GMM. After the mapping relationship of these two features is trained, we can use phonetic discriminant features from source speaker to estimate target speaker's spectral features at conversion stage. The phonetic discriminant features are extracted using PCA from the output layer of a deep neural network (DNN) in an automatic speaker recognition (ASR) system. It can be seen as a low dimensional representation of the senone posteriors …

引用总数

被引用次数：17

20182019202020212022202320241 4 3 3 5 1

学术搜索中的文章

Text-independent voice conversion using deep neural network based phonetic level features

H Zheng, W Cai, T Zhou, S Zhang, M Li - 2016 23rd International Conference on Pattern …, 2016

被引用次数：17 相关文章所有 5 个版本