Contrastive self-supervised speaker embedding with sequential disentanglement

Y Tu, MW Mak, JT Chien - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
Contrastive self-supervised learning has been widely used in speaker embedding to
address the labeling challenge. Contrastive speaker embedding assumes that the contrast …

CCSRD: Content-Centric Speech Representation Disentanglement Learning for End-to-End Speech Translation

X Zhao, H Sun, Y Lei, S Zhu… - Findings of the Association …, 2023 - aclanthology.org
Deep neural networks have demonstrated their capacity in extracting features from speech
inputs. However, these features may include non-linguistic speech factors such as timbre …

[PDF][PDF] Joint Language and Speaker Classification in Naturalistic Bilingual Adult-Toddler Interactions

S Dutta, I López-Espejo, D Irvin, JHL Hansen - Language, 2024 - isca-archive.org
Bilingual children at a young age can benefit from exposure to dual language, impacting
their language and literacy development. Speech technology can aid in developing tools to …

Estimating the completeness of discrete speech units

SL Yeh, H Tang - arXiv preprint arXiv:2409.06109, 2024 - arxiv.org
Representing speech with discrete units has been widely used in speech codec and speech
generation. However, there are several unverified claims about self-supervised discrete …

VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis

W Lin, C He, MW Mak, J Lian, KA Lee - arXiv preprint arXiv:2403.00529, 2024 - arxiv.org
Achieving nuanced and accurate emulation of human voice has been a longstanding goal in
artificial intelligence. Although significant progress has been made in recent years, the …