Self-supervised neural factor analysis for disentangling utterance-level speech representations

Y Tu, MW Mak, JT Chien - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org

Contrastive self-supervised learning has been widely used in speaker embedding to
address the labeling challenge. Contrastive speaker embedding assumes that the contrast …

被引用次数：5 相关文章所有 2 个版本

[PDF] aclanthology.org

CCSRD: Content-Centric Speech Representation Disentanglement Learning for End-to-End Speech Translation

X Zhao, H Sun, Y Lei, S Zhu… - Findings of the Association …, 2023 - aclanthology.org

Deep neural networks have demonstrated their capacity in extracting features from speech
inputs. However, these features may include non-linguistic speech factors such as timbre …

被引用次数：4 相关文章所有 3 个版本

[PDF] isca-archive.org

[PDF][PDF] Joint Language and Speaker Classification in Naturalistic Bilingual Adult-Toddler Interactions

S Dutta, I López-Espejo, D Irvin, JHL Hansen - Language, 2024 - isca-archive.org

Bilingual children at a young age can benefit from exposure to dual language, impacting
their language and literacy development. Speech technology can aid in developing tools to …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Estimating the completeness of discrete speech units

SL Yeh, H Tang - arXiv preprint arXiv:2409.06109, 2024 - arxiv.org

Representing speech with discrete units has been widely used in speech codec and speech
generation. However, there are several unverified claims about self-supervised discrete …

VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis

W Lin, C He, MW Mak, J Lian, KA Lee - arXiv preprint arXiv:2403.00529, 2024 - arxiv.org

Achieving nuanced and accurate emulation of human voice has been a longstanding goal in
artificial intelligence. Although significant progress has been made in recent years, the …

高级搜索

QQ 群