Generative pre-training for speech with autoregressive predictive coding

YA Chung, J Glass - ICASSP 2020-2020 IEEE International …, 2020 - ieeexplore.ieee.org
Learning meaningful and general representations from unannotated speech that are
applicable to a wide range of tasks remains challenging. In this paper we propose to use …

Truly unsupervised acoustic word embeddings using weak top-down constraints in encoder-decoder models

H Kamper - ICASSP 2019-2019 IEEE International Conference …, 2019 - ieeexplore.ieee.org
We investigate unsupervised models that can map a variable-duration speech segment to a
fixed-dimensional representation. In settings where unlabelled speech is the only available …

Improved speech representations with multi-target autoregressive predictive coding

YA Chung, J Glass - arXiv preprint arXiv:2004.05274, 2020 - arxiv.org
Training objectives based on predictive coding have recently been shown to be very
effective at learning meaningful representations from unlabeled speech. One example is …

Semantic association computation: a comprehensive survey

S Jabeen, X Gao, P Andreae - Artificial Intelligence Review, 2020 - Springer
Semantic association computation is the process of quantifying the strength of a semantic
connection between two textual units, based on different types of semantic relations …

Acoustic word embeddings for zero-resource languages using self-supervised contrastive learning and multilingual adaptation

C Jacobs, Y Matusevych… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
Acoustic word embeddings (AWEs) are fixed-dimensional representations of variable-length
speech segments. For zero-resource languages where labelled data is not available, one …

Completely unsupervised speech recognition by a generative adversarial network harmonized with iteratively refined hidden markov models

KY Chen, CP Tsai, DR Liu, HY Lee, L Lee - arXiv preprint arXiv …, 2019 - arxiv.org
Producing a large annotated speech corpus for training ASR systems remains difficult for
more than 95% of languages all over the world which are low-resourced, but collecting a …

Audio word2vec: Sequence-to-sequence autoencoding for unsupervised learning of audio segmentation and representation

YC Chen, SF Huang, H Lee, YH Wang… - … /ACM Transactions on …, 2019 - ieeexplore.ieee.org
In text, word2vec transforms each word into a fixed-size vector used as the basic component
in applications of natural language processing. Given a large collection of unannotated …

Aipnet: Generative adversarial pre-training of accent-invariant networks for end-to-end speech recognition

YC Chen, Z Yang, CF Yeh, M Jain… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
As one of the major sources in speech variability, accents have posed a grand challenge to
the robustness of speech recognition systems. In this paper, our goal is to build a unified end …

Acoustically grounded word embeddings for improved acoustics-to-word speech recognition

S Settle, K Audhkhasi, K Livescu… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
Direct acoustics-to-word (A2W) systems for end-to-end automatic speech recognition are
simpler to train, and more efficient to decode with, than sub-word systems. However, A2W …

Multilingual acoustic word embedding models for processing zero-resource languages

H Kamper, Y Matusevych… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
Acoustic word embeddings are fixed-dimensional representations of variable-length speech
segments. In settings where unlabelled speech is the only available resource, such …