[PDF][PDF] The zero resource speech challenge 2015.

M Versteegh, R Thiolliere, T Schatz, XN Cao… - Interspeech, 2015 - isca-archive.org
Abstract The Interspeech 2015 Zero Resource Speech Challenge aims at discovering
subword and word units from raw speech. The challenge provides the first unified and open …

Recent developments in spoken term detection: a survey

A Mandal, KR Prasanna Kumar, P Mitra - International Journal of Speech …, 2014 - Springer
Spoken term detection (STD) provides an efficient means for content based indexing of
speech. However, achieving high detection performance, faster speed, detecting ot-of …

Segmental contrastive predictive coding for unsupervised word segmentation

S Bhati, J Villalba, P Żelasko, L Moro-Velazquez… - arXiv preprint arXiv …, 2021 - arxiv.org
Automatic detection of phoneme or word-like units is one of the core objectives in zero-
resource speech processing. Recent attempts employ self-supervised training methods …

Self-supervised language learning from raw audio: Lessons from the zero resource speech challenge

E Dunbar, N Hamilakis… - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org
Recent progress in self-supervised or unsupervised machine learning has opened the
possibility of building a full speech processing system from raw audio without using any …

Spoken content retrieval—beyond cascading speech recognition with text retrieval

L Lee, J Glass, H Lee, C Chan - IEEE/ACM Transactions on …, 2015 - ieeexplore.ieee.org
Spoken content retrieval refers to directly indexing and retrieving spoken content based on
the audio rather than text descriptions. This potentially eliminates the requirement of …

Unsupervised speech segmentation and variable rate representation learning using segmental contrastive predictive coding

S Bhati, J Villalba, P Żelasko… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
Typically, unsupervised segmentation of speech into the phone-and wordlike units are
treated as separate tasks and are often done via different methods which do not fully …

[PDF][PDF] Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: a feasibility study.

H Chen, CC Leung, L Xie, B Ma, H Li - INTERSPEECH, 2015 - isca-archive.org
We adopt a Dirichlet process Gaussian mixture model (DPGMM) for unsupervised acoustic
modeling and represent speech frames with Gaussian posteriorgrams. The model performs …

ODSQA: Open-domain spoken question answering dataset

CH Lee, SM Wang, HC Chang… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org
Reading comprehension by machine has been widely studied, but machine comprehension
of spoken content is still a less investigated problem. In this paper, we release Open-Domain …

Acoustic segment modeling with spectral clustering methods

H Wang, T Lee, CC Leung, B Ma… - IEEE/ACM Transactions …, 2015 - ieeexplore.ieee.org
This paper presents a study of spectral clustering-based approaches to acoustic segment
modeling (ASM). ASM aims at finding the underlying phoneme-like speech units and …

[PDF][PDF] Building an ASR system for a low-research language through the adaptation of a high-resource language ASR system: preliminary results

O Scharenborg, F Ciannella, S Palaskar… - … on Natural Language …, 2017 - researchgate.net
For many languages in the world, not enough (annotated) speech data is available to train
an ASR system. We here propose a new three-step method to build an ASR system for such …