关注
shengpeng ji
shengpeng ji
在 zju.edu.cn 的电子邮件经过验证 - 首页
标题
引用次数
引用次数
年份
Mega-tts: Zero-shot text-to-speech at scale with intrinsic inductive bias
Z Jiang, Y Ren, Z Ye, J Liu, C Zhang, Q Yang, S Ji, R Huang, C Wang, ...
arXiv preprint arXiv:2306.03509, 2023
492023
Mega-tts 2: Zero-shot text-to-speech with arbitrary length speech prompts
Z Jiang, J Liu, Y Ren, J He, Z Ye, S Ji, Q Yang, C Zhang, P Wei, C Wang, ...
ICLR 2024, 2023
34*2023
Textrolspeech: A text style control speech corpus with codec language text-to-speech models
S Ji, J Zuo, M Fang, Z Jiang, F Chen, X Duan, B Huai, Z Zhao
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
192024
Language-codec: Reducing the gaps between discrete codec representation and speech language models
S Ji, M Fang, Z Jiang, R Huang, J Zuo, S Wang, Z Zhao
arXiv preprint arXiv:2402.12208, 2024
72024
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
S Ji, Z Jiang, H Wang, J Zuo, Z Zhao
ACL 2024 Main, 2024
52024
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
S Ji, J Zuo, M Fang, S Zheng, Q Chen, W Wang, Z Jiang, H Huang, ...
arXiv preprint arXiv:2406.01205, 2024
32024
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Alibaba technical report
arXiv preprint arXiv:2407.04051, 2024
22024
ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling
M Fang, S Ji, J Zuo, H Huang, Y Xia, J Zhu, X Cheng, X Yang, W Liu, ...
arXiv preprint arXiv:2406.17507, 2024
22024
Unlocking the Potential of Multimodal Unified Discrete Representation through Training-Free Codebook Optimization and Hierarchical Alignment
H Huang, Y Xia, S Ji, S Wang, H Wang, J Zhu, Z Dong, Z Zhao
arXiv preprint arXiv:2403.05168, 2024
22024
Generating Neural Networks for Diverse Networking Classification Tasks via Hardware-Aware Neural Architecture Search
G Xie, Q Li, Z Shi, H Fang, S Ji, Y Jiang, Z Yuan, L Ma, M Xu
IEEE Transactions on Computers, 2023
22023
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
S Ji, Z Jiang, X Cheng, Y Chen, M Fang, J Zuo, Q Yang, R Li, Z Zhang, ...
arXiv preprint arXiv:2408.16532, 2024
12024
SyncTalklip: Highly Synchronized Lip-Readable Speaker Generation with Multi-Task Learning
X Yang, X Cheng, D Fu, M Fang, J Zuo, S Ji, T Jin, Z Zhao
ACM Multimedia 2024, 2024
12024
系统目前无法执行此操作,请稍后再试。
文章 1–12