Making more of little data: Improving low-resource automatic speech recognition using data augmentation

M Bartelds, N San, B McDonnell, D Jurafsky… - arXiv preprint arXiv …, 2023 - arxiv.org
The performance of automatic speech recognition (ASR) systems has advanced
substantially in recent years, particularly for languages for which a large amount of …

[HTML][HTML] Neural representations for modeling variation in speech

M Bartelds, W de Vries, F Sanal, C Richter… - Journal of …, 2022 - Elsevier
Variation in speech is often quantified by comparing phonetic transcriptions of the same
utterance. However, manually transcribing speech is time-consuming and error prone. As an …

Bottom-up discovery of structure and variation in response tokens ('backchannels') across diverse languages

A Liesenfeld, M Dingemanse - Interspeech 2022, 2022 - pure.mpg.de
Response tokens (also known as backchannels, continuers, or feedback) are a frequent
feature of human interaction, where they serve to display understanding and streamline turn …

Efficiency-oriented approaches for self-supervised speech representation learning

L Lugo, V Vielzeuf - International Journal of Speech Technology, 2024 - Springer
Self-supervised learning enables the training of large neural models without the need for
large, labeled datasets. It has been generating breakthroughs in several fields, including …

[PDF][PDF] Comparing language-specific and cross-language acoustic models for low-resource phonetic forced alignment

E Chodroff, E Ahn, H Dolatian - Language Documentation & …, 2024 - eleanorchodroff.com
Phonetic forced alignment can greatly expedite spoken language analysis by providing
automatictimealignmentsattheword-andphone-levels. Inthecaseoflow-resourcelanguages, it …

On the nature of discrete speech representations in multilingual self-supervised models

BM Abdullah, MM Shaik, D Klakow - Proceedings of the 5th …, 2023 - aclanthology.org
Self-supervision has emerged as an effective paradigm for learning representations of
spoken language from raw audio without explicit labels or transcriptions. Self-supervised …

Automated speech tools for helping communities process restricted-access corpora for language revival efforts

N San, M Bartelds, T Ogunremi, A Mount… - arXiv preprint arXiv …, 2022 - arxiv.org
Many archival recordings of speech from endangered languages remain unannotated and
inaccessible to community members and language learning programs. One bottleneck is the …

Analyzing the representational geometry of acoustic word embeddings

BM Abdullah, D Klakow - arXiv preprint arXiv:2301.03012, 2023 - arxiv.org
Acoustic word embeddings (AWEs) are vector representations such that different acoustic
exemplars of the same word are projected nearby in the embedding space. In addition to …

BanSpeech: A Multi-domain Bangla Speech Recognition Benchmark Towards Robust Performance in Challenging Conditions

AM Samin, MH Kobir, MMS Rafee, MF Ahmed… - IEEE …, 2024 - ieeexplore.ieee.org
Despite huge improvements in automatic speech recognition (ASR) employing neural
networks, ASR systems still suffer from a lack of robustness and generalizability issues due …

[PDF][PDF] Wave to Interlingua: Analyzing Representations of Multilingual Speech Transformers for Spoken Language Translation

BM Abdullah, MM Shaik, D Klakow - Proc. Interspeech 2024, 2024 - isca-archive.org
Abstract In Transformer-based Speech-to-Text (S2T) translation, an encoder-decoder model
is trained end-to-end to take as input an untranscribed acoustic signal in the source …