Textually pretrained speech language models

S Latif, M Shoukat, F Shamshad, M Usama… - arXiv preprint arXiv …, 2023 - arxiv.org

This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

Towards audio language modeling-an overview

H Wu, X Chen, YC Lin, K Chang, HL Chung… - arXiv preprint arXiv …, 2024 - arxiv.org

Neural audio codecs are initially introduced to compress audio data into compact codes to
reduce transmission latency. Researchers recently discovered the potential of codecs as …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks

S Maiti, Y Peng, S Choi, J Jung… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

We propose a decoder-only language model, VoxtLM, that can perform four tasks: speech
recognition, speech synthesis, text generation, and speech continuation. VoxtLM integrates …

被引用次数：31 相关文章所有 3 个版本

[PDF] aaai.org

Diverse and aligned audio-to-video generation via text-to-video model adaptation

G Yariv, I Gat, S Benaim, L Wolf, I Schwartz… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

We consider the task of generating diverse and realistic videos guided by natural audio
samples from a wide variety of semantic classes. For this task, the videos are required to be …

被引用次数：18 相关文章所有 3 个版本

[PDF] arxiv.org

Speechgen: Unlocking the generative power of speech language models with prompts

H Wu, KW Chang, YK Wu, H Lee - arXiv preprint arXiv:2306.02207, 2023 - arxiv.org

Large language models (LLMs) have gained considerable attention for Artificial Intelligence
Generated Content (AIGC), particularly with the emergence of ChatGPT. However, the direct …

被引用次数：18 相关文章所有 2 个版本

[PDF] arxiv.org

Towards General-Purpose Text-Instruction-Guided Voice Conversion

CY Kuan, CA Li, TY Hsu, TY Lin… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

This paper introduces a novel voice conversion (VC) model, guided by text instructions such
as “articulate slowly with a deep tone “or “speak in a cheerful boyish voice”. Unlike …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

An exploration of in-context learning for speech language model

MH Hsu, KW Chang, SW Li, H Lee - arXiv preprint arXiv:2310.12477, 2023 - arxiv.org

Ever since the development of GPT-3 in the natural language processing (NLP) field, in-
context learning (ICL) has played an important role in utilizing large language models …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Spirit-lm: Interleaved spoken and written language model

TA Nguyen, B Muller, B Yu, MR Costa-Jussa… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce SPIRIT-LM, a foundation multimodal language model that freely mixes text and
speech. Our model is based on a pretrained text language model that we extend to the …

被引用次数：7 相关文章所有 4 个版本

[PDF] arxiv.org

Speechprompt: Prompting speech language models for speech processing tasks

KW Chang, H Wu, YK Wang, YK Wu… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Prompting has become a practical method for utilizing pre-trained language models (LMs).
This approach offers several advantages. It allows an LM to adapt to new tasks with minimal …

被引用次数：1 相关文章所有 5 个版本

[PDF] arxiv.org

Prompting and adapter tuning for self-supervised encoder-decoder speech model

KW Chang, MH Chen, YP Lin, JN Hsu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Prompting and adapter tuning have emerged as efficient alternatives to fine-tuning (FT)
methods. However, existing studies on speech prompting focused on classification tasks …

被引用次数：6 相关文章所有 3 个版本

高级搜索

QQ 群