Last: Language model aware speech tokenization

A Turetzky, Y Adi - arXiv preprint arXiv:2409.03701, 2024 - arxiv.org
Speech tokenization serves as the foundation of speech language model (LM), enabling
them to perform various tasks such as spoken language modeling, text-to-speech, speech-to …

Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback

GT Lin, PG Shivakumar, A Gourav, Y Gu… - arXiv preprint arXiv …, 2024 - arxiv.org
While textless Spoken Language Models (SLMs) have shown potential in end-to-end
speech-to-speech modeling, they still lag behind text-based Large Language Models …

Self-Supervised Syllable Discovery Based on Speaker-Disentangled Hubert

R Komatsu, T Shinozaki - 2024 IEEE Spoken Language …, 2024 - ieeexplore.ieee.org
Self-supervised speech representation learning has become essential for extracting
meaningful features from untranscribed audio. Recent advances highlight the potential of …

Hubertopic: Enhancing Semantic Representation of Hubert Through Self-Supervision Utilizing Topic Model

T Maekaku, J Shi, X Chang, Y Fujita… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Recently, the usefulness of self-supervised representation learning (SSRL) methods has
been confirmed in various downstream tasks. Many of these models, as exemplified by …

Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model

HC Fang, NX Ye, YJ Shih, P Peng, HF Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advances in self-supervised speech models have shown significant improvement in
many downstream tasks. However, these models predominantly centered on frame-level …