Prodiff: Progressive fast diffusion model for high-quality text-to-speech R Huang, Z Zhao, H Liu, J Liu, C Cui, Y Ren Proceedings of the 30th ACM International Conference on Multimedia, 2595-2605, 2022 | 131 | 2022 |
Transpeech: Speech-to-speech translation with bilateral perturbation R Huang, J Liu, H Liu, Y Ren, L Zhang, J He, Z Zhao arXiv preprint arXiv:2205.12523, 2022 | 40 | 2022 |
Mixspeech: Cross-modality self-learning with audio-visual stream mixup for visual speech translation and recognition X Cheng, T Jin, R Huang, L Li, W Lin, Z Wang, Y Wang, H Liu, A Yin, ... Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 14 | 2023 |
RMSSinger: realistic-music-score based singing voice synthesis J He, J Liu, Z Ye, R Huang, C Cui, H Liu, Z Zhao arXiv preprint arXiv:2305.10686, 2023 | 9 | 2023 |
Av-transpeech: Audio-visual robust speech-to-speech translation R Huang, H Liu, X Cheng, Y Ren, L Li, Z Ye, J He, L Zhang, J Liu, X Yin, ... arXiv preprint arXiv:2305.15403, 2023 | 8 | 2023 |
Vit-tts: visual text-to-speech with scalable diffusion transformer H Liu, R Huang, X Lin, W Xu, M Zheng, H Chen, J He, Z Zhao arXiv preprint arXiv:2305.12708, 2023 | 6 | 2023 |
Wav2sql: Direct generalizable speech-to-sql parsing H Liu, R Huang, J He, G Sun, R Shen, X Cheng, Z Zhao arXiv preprint arXiv:2305.12552, 2023 | 2 | 2023 |
MEDIC: Zero-shot Music Editing with Disentangled Inversion Control H Liu, J Wang, R Huang, Y Liu, J Xu, Z Zhao arXiv preprint arXiv:2407.13220, 2024 | | 2024 |
AudioLCM: Text-to-Audio Generation with Latent Consistency Models H Liu, R Huang, Y Liu, H Cao, J Wang, X Cheng, S Zheng, Z Zhao arXiv preprint arXiv:2406.00356, 2024 | | 2024 |