Speak foreign languages with your own voice: Cross-lingual neural codec language modeling

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：150 相关文章所有 6 个版本

[PDF] arxiv.org

Sparks of large audio models: A survey and outlook

S Latif, M Shoukat, F Shamshad, M Usama… - arXiv preprint arXiv …, 2023 - arxiv.org

This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

被引用次数：19 相关文章所有 4 个版本

[PDF] github.io

The rise and potential of large language model based agents: A survey

Z Xi, W Chen, X Guo, W He, Y Ding, B Hong… - arXiv preprint arXiv …, 2023 - arxiv.org

For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing
the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are …

被引用次数：488 相关文章所有 4 个版本

[PDF] arxiv.org

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin… - arXiv preprint arXiv …, 2023 - arxiv.org

Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies …

被引用次数：155 相关文章所有 3 个版本

[PDF] arxiv.org

Uniaudio: An audio foundation model toward universal audio generation

D Yang, J Tian, X Tan, R Huang, S Liu, X Chang… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language models (LLM) have demonstrated the capability to handle a variety of
generative tasks. This paper presents the UniAudio system, which, unlike prior task-specific …

被引用次数：67 相关文章所有 3 个版本

[PDF] arxiv.org

Viola: Unified codec language models for speech recognition, synthesis, and translation

T Wang, L Zhou, Z Zhang, Y Wu, S Liu, Y Gaur… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent research shows a big convergence in model architecture, training objectives, and
inference methods across various tasks for different modalities. In this paper, we propose …

被引用次数：75 相关文章所有 2 个版本

[PDF] arxiv.org

Speechx: Neural codec language model as a versatile speech transformer

X Wang, M Thakker, Z Chen, N Kanda… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Recent advancements in generative speech models based on audio-text prompts have
enabled remarkable innovations like high-quality zero-shot text-to-speech. However …

被引用次数：51 相关文章所有 2 个版本

[PDF] arxiv.org

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arXiv preprint arXiv …, 2023 - arxiv.org

What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …

被引用次数：78 相关文章

[PDF] arxiv.org

Character-llm: A trainable agent for role-playing

Y Shao, L Li, J Dai, X Qiu - arXiv preprint arXiv:2310.10158, 2023 - arxiv.org

Large language models (LLMs) can be used to serve as agents to simulate human
behaviors, given the powerful ability to understand human instructions and provide high …

被引用次数：94 相关文章所有 5 个版本

[PDF] arxiv.org

On decoder-only architecture for speech-to-text and large language model integration

J Wu, Y Gaur, Z Chen, L Zhou, Y Zhu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Large language models (LLMs) have achieved remarkable success in the field of natural
language processing, enabling better human-computer interaction using natural language …

被引用次数：74 相关文章所有 3 个版本

高级搜索

QQ 群