Neural codecs have become crucial to recent speech and audio generation research. In addition to signal compression capabilities, discrete codecs have also been found to …
This research presents Muskits-ESPnet, a versatile toolkit that introduces new paradigms to Singing Voice Synthesis (SVS) through the application of pretrained audio models in both …
M Kim, J Choi, D Kim, YM Ro - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
This paper proposes a textless training method for many-to-many multilingual speech-to- speech translation that can also benefit the transfer of pre-trained knowledge to text-based …
Speech discrete representation has proven effective in various downstream applications due to its superior compression rate of the waveform, fast convergence during training, and …
Representing speech and audio signals in discrete units has become a compelling alternative to traditional high-dimensional feature vectors. Numerous studies have …
Y Tang, Y Wu, J Shi, Q Jin - arXiv preprint arXiv:2406.08905, 2024 - arxiv.org
Discrete representation has shown advantages in speech generation tasks, wherein discrete tokens are derived by discretizing hidden features from self-supervised learning …
KW Chang, H Wu, YK Wang, YK Wu… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Prompting has become a practical method for utilizing pre-trained language models (LMs). This approach offers several advantages. It allows an LM to adapt to new tasks with minimal …
P Chen, S Sun, C Shan, Q Yang, L Xie - arXiv preprint arXiv:2406.18862, 2024 - arxiv.org
Unified speech-text models like SpeechGPT, VioLA, and AudioPaLM have shown impressive performance across various speech-related tasks, especially in Automatic …
A Turetzky, Y Adi - arXiv preprint arXiv:2409.03701, 2024 - arxiv.org
Speech tokenization serves as the foundation of speech language model (LM), enabling them to perform various tasks such as spoken language modeling, text-to-speech, speech-to …