An overview of voice conversion and its challenges: From statistical modeling to deep learning B Sisman, J Yamagishi, S King, H Li IEEE/ACM Transactions on Audio, Speech, and Language Processing 29, 132-157, 2021 | 316 | 2021 |
Seen and unseen emotional style transfer for voice conversion with a new emotional speech dataset K Zhou, B Sisman, R Liu, H Li IEEE ICASSP 2021 International Conference on Acoustics, Speech, and Signal …, 2021 | 168 | 2021 |
Emotional Voice Conversion: Theory, Databases and ESD K Zhou, B Sisman, R Liu, H Li Speech Communication, 2022 | 119 | 2022 |
Expressive TTS Training with Frame and Style Reconstruction Loss R Liu, B Sisman, G Gao, H Li IEEE/ACM Transactions on Audio, Speech and Language Processing, 2021 | 85 | 2021 |
VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019 A Tjandra, B Sisman, M Zhang, S Sakti, H Li, S Nakamura Proc. Interspeech 2019, 2019 | 85 | 2019 |
Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data K Zhou, B Sisman, H Li Proc. Odyssey 2020, Tokyo, Japan, 2020 | 76 | 2020 |
Teacher-Student Training for Robust Tacotron-based TTS R Liu, B Sisman, J Li, F Bao, G Gao, H Li IEEE ICASSP 2020 International Conference on Acoustics, Speech, and Signal …, 2020 | 64 | 2020 |
A voice conversion framework with tandem feature sparse representation and speaker-adapted wavenet vocoder B Sisman, M Zhang, H Li Proc. Interspeech, 1978 -1982, 2018 | 59 | 2018 |
Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion K Zhou, B Sisman, M Zhang, H Li Proc. Interspeech 2020, 2020 | 56 | 2020 |
Group sparse representation with wavenet vocoder adaptation for spectrum and prosody conversion B Sisman, M Zhang, H Li IEEE/ACM Transactions on Audio, Speech, and Language Processing 27 (6), 1085 …, 2019 | 51 | 2019 |
Sparse representation of phonetic features for voice conversion with and without parallel data B Sisman, H Li, KC Tan Automatic Speech Recognition and Understanding Workshop (ASRU), 2017 IEEE …, 2017 | 48 | 2017 |
SINGAN: Singing voice conversion with generative adversarial networks B Sisman, K Vijayan, M Dong, H Li Asia-Pacific Signal and Information Processing Association Annual Summit and …, 2019 | 45 | 2019 |
Adaptive Wavenet Vocoder for Residual Compensation in GAN-based Voice Conversion B Sisman, M Zhang, S Sakti, H Li, S Nakamura 2018 IEEE Spoken Language Technology Workshop (SLT), 282-289, 2018 | 45 | 2018 |
Emotion Intensity and its Control for Emotional Voice Conversion K Zhou, B Sisman, R Rana, BW Schuller, H Li IEEE Transactions on Affective Computing, 2023 | 42 | 2023 |
VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech K Zhou, B Sisman, H Li 2021 IEEE Spoken Language Technology Workshop (SLT 2021), 2021 | 39 | 2021 |
Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability R Liu, B Sisman, H Li INTERSPEECH 2021, 2021 | 38 | 2021 |
Transformation of prosody in voice conversion B Sisman, H Li, KC Tan Asia-Pacific Signal and Information Processing Association Annual Summit and …, 2017 | 38 | 2017 |
On the study of Generative Adversarial Networks for Cross-lingual Voice Conversion B Sisman, M Zhang, M Dong, H Li IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019, 2019 | 36 | 2019 |
GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis R Liu, B Sisman, H Li IEEE ICASSP 2021 International Conference on Acoustics, Speech, and Signal …, 2021 | 34 | 2021 |
Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training K Zhou, B Sisman, H Li INTERSPEECH 2021, 2021 | 30 | 2021 |