Amphion: An open-source audio, music and speech generation toolkit X Zhang, L Xue, Y Wang, Y Gu, X Chen, Z Fang, H Chen, L Zou, C Wang, ... arXiv preprint arXiv:2312.09911, 2023 | 8 | 2023 |
Leveraging content-based features from multiple acoustic models for singing voice conversion X Zhang, Y Gu, H Chen, Z Fang, L Zou, L Xue, Z Wu arXiv preprint arXiv:2310.11160, 2023 | 4 | 2023 |
Multi-scale sub-band constant-q transform discriminator for high-fidelity vocoder Y Gu, X Zhang, L Xue, Z Wu ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 3 | 2024 |
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation H He, Z Shang, C Wang, X Li, Y Gu, H Hua, L Liu, C Yang, J Li, P Shi, ... arXiv preprint arXiv:2407.05361, 2024 | | 2024 |
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds Y Zhang, Y Gu, Y Zeng, Z Xing, Y Wang, Z Wu, K Chen arXiv preprint arXiv:2407.01494, 2024 | | 2024 |
An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder Y Gu, X Zhang, L Xue, H Li, Z Wu arXiv preprint arXiv:2404.17161, 2024 | | 2024 |