Unsupervised speech recognition

A Baevski, WN Hsu, A Conneau… - Advances in Neural …, 2021 - proceedings.neurips.cc
Despite rapid progress in the recent past, current speech recognition systems still require
labeled training data which limits this technology to a small fraction of the languages spoken …

Data augmenting contrastive learning of speech representations in the time domain

E Kharitonov, M Rivière, G Synnaeve… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
Contrastive Predictive Coding (CPC), based on predicting future segments of speech from
past segments is emerging as a powerful algorithm for representation learning of speech …

Vector-quantized neural networks for acoustic unit discovery in the zerospeech 2020 challenge

B Van Niekerk, L Nortje, H Kamper - arXiv preprint arXiv:2005.09409, 2020 - arxiv.org
In this paper, we explore vector quantization for acoustic unit discovery. Leveraging
unlabelled data, we aim to learn discrete representations of speech that separate phonetic …

Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner

E Dupoux - Cognition, 2018 - Elsevier
Spectacular progress in the information processing sciences (machine learning, wearable
sensors) promises to revolutionize the study of cognitive development. Here, we analyse the …

[PDF][PDF] The zero resource speech challenge 2015.

M Versteegh, R Thiolliere, T Schatz, XN Cao… - Interspeech, 2015 - isca-archive.org
Abstract The Interspeech 2015 Zero Resource Speech Challenge aims at discovering
subword and word units from raw speech. The challenge provides the first unified and open …

[PDF][PDF] A nonparametric Bayesian approach to acoustic model discovery

C Lee, J Glass - Proceedings of the 50th Annual Meeting of the …, 2012 - aclanthology.org
We investigate the problem of acoustic modeling in which prior language-specific
knowledge and transcribed data are unavailable. We present an unsupervised model that …

[图书][B] Phonological development: The first two years

MM Vihman - 2014 - pure.york.ac.uk
This book provides an extensive overview of research into child production and perception.
It focuses primarily on the first two years of life because, for the majority of children, that …

Learning hierarchical discrete linguistic units from visually-grounded speech

D Harwath, WN Hsu, J Glass - arXiv preprint arXiv:1911.09602, 2019 - arxiv.org
In this paper, we present a method for learning discrete linguistic units by incorporating
vector quantization layers into neural models of visually grounded speech. We show that our …

Evaluating speech features with the minimal-pair ABX task: Analysis of the classical MFC/PLP pipeline

T Schatz, V Peddinti, F Bach, A Jansen… - … 2013: 14th Annual …, 2013 - hal.science
We present a new framework for the evaluation of speech rep-resentations in zero-resource
settings, that extends and complements previous work by Carlin, Jansen and Hermansky [1] …

A segmental framework for fully-unsupervised large-vocabulary speech recognition

H Kamper, A Jansen, S Goldwater - Computer Speech & Language, 2017 - Elsevier
Zero-resource speech technology is a growing research area that aims to develop methods
for speech processing in the absence of transcriptions, lexicons, or language modelling text …