Acoustic word embeddings for zero-resource languages using self-supervised contrastive learning and multilingual adaptation

C Jacobs, Y Matusevych… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
Acoustic word embeddings (AWEs) are fixed-dimensional representations of variable-length
speech segments. For zero-resource languages where labelled data is not available, one …

A comparison of self-supervised speech representations as input features for unsupervised acoustic word embeddings

L Van Staden, H Kamper - 2021 IEEE Spoken Language …, 2021 - ieeexplore.ieee.org
Many speech processing tasks involve measuring the acoustic similarity between speech
segments. Acoustic word embeddings (AWE) allow for efficient comparisons by mapping …

Unsupervised end-to-end learning of discrete linguistic units for voice conversion

AT Liu, P Hsu, H Lee - arXiv preprint arXiv:1905.11563, 2019 - arxiv.org
We present an unsupervised end-to-end training scheme where we discover discrete
subword units from speech without using any labels. The discrete subword units are learned …

Improved acoustic word embeddings for zero-resource languages using multilingual transfer

H Kamper, Y Matusevych… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org
Acoustic word embeddings are fixed-dimensional representations of variable-length speech
segments. Such embeddings can form the basis for speech search, indexing and discovery …

Multilingual transfer of acoustic word embeddings improves when training on languages related to the target zero-resource language

C Jacobs, H Kamper - arXiv preprint arXiv:2106.12834, 2021 - arxiv.org
Acoustic word embedding models map variable duration speech segments to fixed
dimensional vectors, enabling efficient speech search and discovery. Previous work …

Multilingual acoustic word embeddings for zero-resource languages

C Jacobs, H Kamper - arXiv preprint arXiv:2401.10543, 2024 - arxiv.org
This research addresses the challenge of developing speech applications for zero-resource
languages that lack labelled data. It specifically uses acoustic word embedding (AWE)--fixed …

Speech personalization and federated training using real world noise

M Sharifi, V Carbune - US Patent 11,741,944, 2023 - Google Patents
A method of training a speech model includes receiving, at a voice-enabled device, a fixed
set of training utterances where each training utterance in the fixed set of training utterances …