AudioLDM: Text-to-Audio Generation with Latent Diffusion Models H Liu*, Z Chen*, Y Yuan, X Mei, X Liu, D Mandic, W Wang, MD Plumbley Proceedings of the 40th International Conference on Machine Learning 202 …, 2023 | 272 | 2023 |
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality X Tan*, J Chen*, H Liu*, J Cong, C Zhang, Y Liu, X Wang, Y Leng, Y Yi, ... IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 46 (6 …, 2022 | 126 | 2022 |
Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research X Mei, C Meng, H Liu, Q Kong, T Ko, C Zhao, MD Plumbley, Y Zou, ... arXiv preprint arXiv:2303.17395, 2023 | 100* | 2023 |
Decoupling magnitude and phase estimation with deep resunet for music source separation Q Kong, Y Cao, H Liu, K Choi, Y Wang International Society for Music Information Retrieval Conference, 2021 | 81 | 2021 |
VoiceFixer: Toward general speech restoration with neural vocoder H Liu, Q Kong, Q Tian, Y Zhao, DL Wang, C Huang, Y Wang arXiv preprint arXiv:2109.13731, 2021 | 71* | 2021 |
AudioLDM 2: Learning holistic audio generation with self-supervised pretraining H Liu, Q Tian, Y Yuan, X Liu, X Mei, Q Kong, Y Wang, W Wang, Y Wang, ... IEEE/ACM Transactions on Audio, Speech, and Language Processing 32, 2871-2883, 2023 | 70* | 2023 |
Binauralgrad: A two-stage conditional diffusion probabilistic model for binaural audio synthesis Y Leng, Z Chen, J Guo, H Liu, J Chen, X Tan, D Mandic, L He, X Li, T Qin, ... Advances in Neural Information Processing Systems 35, 23689-23700, 2022 | 47 | 2022 |
Separate what you describe: language-queried audio source separation X Liu, H Liu, Q Kong, X Mei, J Zhao, Q Huang, MD Plumbley, W Wang INTERSPEECH, 2022 | 37* | 2022 |
Leveraging pre-trained bert for audio captioning X Liu, X Mei, Q Huang, J Sun, J Zhao, H Liu, MD Plumbley, V Kilic, ... 2022 30th European Signal Processing Conference (EUSIPCO), 1145-1149, 2022 | 28 | 2022 |
Channel-wise subband input for better voice and accompaniment separation on high resolution music H Liu, L Xie, J Wu, G Yang INTERSPEECH, 2020 | 28 | 2020 |
Neural vocoder is all you need for speech super-resolution H Liu, W Choi, X Liu, Q Kong, Q Tian, DL Wang INTERSPEECH, 2022 | 27 | 2022 |
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies K Chen*, Y Wu*, H Liu*, M Nezhurina, T Berg-Kirkpatrick, S Dubnov ICASSP 2024 IEEE International Conference on Acoustics, Speech and Signal …, 2023 | 24 | 2023 |
Learning to detect an animal sound from five examples I Nolasco, S Singh, V Morfi, V Lostanlen, A Strandburg-Peshkin, ... Ecological informatics 77, 102258, 2023 | 22 | 2023 |
Language-based audio retrieval with pre-trained models X Mei, X Liu, H Liu, J Sun, MD Plumbley, W Wang Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge …, 2022 | 20 | 2022 |
CWS-PResUNet: Music source separation with channel-wise subband phase-aware resunet H Liu, Q Kong, J Liu ISMIR Music Demixing (MDX) Workshop, 2021 | 20 | 2021 |
Speech enhancement with weakly labelled data from AudioSet Q Kong, H Liu, X Du, L Chen, R Xia, Y Wang INTERSPEECH, 2021 | 16 | 2021 |
Resgrad: Residual denoising diffusion probabilistic models for text to speech Z Chen, Y Wu, Y Leng, J Chen, H Liu, X Tan, Y Cui, K Wang, L He, S Zhao, ... arXiv preprint arXiv:2212.14518, 2022 | 14 | 2022 |
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention X Liu, Q Huang, X Mei, H Liu, Q Kong, J Sun, S Li, T Ko, Y Zhang, ... INTERSPEECH, 2022 | 13 | 2022 |
Wavjourney: Compositional audio creation with large language models X Liu, Z Zhu, H Liu, Y Yuan, M Cui, Q Huang, J Liang, Y Cao, Q Kong, ... arXiv preprint arXiv:2307.14335, 2023 | 12* | 2023 |
Joint echo cancellation and noise suppression based on cascaded magnitude and complex mask estimation X Shu, Y Zhu, Y Chen, L Chen, H Liu, C Huang, Y Wang arXiv preprint arXiv:2107.09298, 2021 | 12 | 2021 |