Meta learning for natural language processing: A survey

H Lee, SW Li, NT Vu - arXiv preprint arXiv:2205.01500, 2022 - arxiv.org
Deep learning has been the mainstream technique in natural language processing (NLP)
area. However, the techniques require many labeled data and are less generalizable across …

Viola: Conditional language models for speech recognition, synthesis, and translation

T Wang, L Zhou, Z Zhang, Y Wu, S Liu… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Recent research shows a big convergence in model architecture, training objectives, and
inference methods across various tasks for different modalities. In this paper, we propose …

Speecht5: Unified-modal encoder-decoder pre-training for spoken language processing

J Ao, R Wang, L Zhou, C Wang, S Ren, Y Wu… - arXiv preprint arXiv …, 2021 - arxiv.org
Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural
language processing models, we propose a unified-modal SpeechT5 framework that …

Lauragpt: Listen, attend, understand, and regenerate audio with gpt

Z Du, J Wang, Q Chen, Y Chu, Z Gao, Z Li, K Hu… - arXiv preprint arXiv …, 2023 - arxiv.org
Generative Pre-trained Transformer (GPT) models have achieved remarkable performance
on various natural language processing tasks, and have shown great potential as …

Speechverse: A large-scale generalizable audio language model

N Das, S Dingliwal, S Ronanki, R Paturi… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have shown incredible proficiency in performing tasks that
require semantic understanding of natural language instructions. Recently, many works …

Speechgen: Unlocking the generative power of speech language models with prompts

H Wu, KW Chang, YK Wu, H Lee - arXiv preprint arXiv:2306.02207, 2023 - arxiv.org
Large language models (LLMs) have gained considerable attention for Artificial Intelligence
Generated Content (AIGC), particularly with the emergence of ChatGPT. However, the direct …

Losses can be blessings: Routing self-supervised speech representations towards efficient multilingual and multitask speech processing

Y Fu, Y Zhang, K Qian, Z Ye, Z Yu… - Advances in Neural …, 2022 - proceedings.neurips.cc
Self-supervised learning (SSL) for rich speech representations has achieved empirical
success in low-resource Automatic Speech Recognition (ASR) and other speech processing …

LauraGPT: Listen, attend, understand, and regenerate audio with GPT

J Wang, Z Du, Q Chen, Y Chu, Z Gao, Z Li, K Hu… - 2023 - openreview.net
Generative Pre-trained Transformer (GPT) models have achieved remarkable performance
on various natural language processing tasks. However, there has been limited research on …

GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot

A Zeng, Z Du, M Liu, K Wang, S Jiang, L Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce GLM-4-Voice, an intelligent and human-like end-to-end spoken chatbot. It
supports both Chinese and English, engages in real-time voice conversations, and varies …

Speech representation learning through self-supervised pretraining and multi-task finetuning

YC Chen, S Yang, CK Lee, S See, H Lee - arXiv preprint arXiv:2110.09930, 2021 - arxiv.org
Speech representation learning plays a vital role in speech processing. Among them, self-
supervised learning (SSL) has become an important research direction. It has been shown …