Robust speech recognition via large-scale weak supervision

A Radford, JW Kim, T Xu, G Brockman… - International …, 2023 - proceedings.mlr.press
We study the capabilities of speech processing systems trained simply to predict large
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …

Advancements of phonetics in the 21st century: Theoretical issues in sociophonetics

T Kendall, N Pharao, J Stuart-Smith, C Vaughn - Journal of Phonetics, 2023 - Elsevier
Variation in speech has always been important to phonetic theory, but takes center stage in
the growing area of sociophonetics, which places the role of the social at the heart of the …

Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities

S Wang, CHH Yang, J Wu, C Zhang - arXiv preprint arXiv:2404.14716, 2024 - arxiv.org
Large language models (LLMs) can adapt to new tasks through in-context learning (ICL)
based on a few examples presented in dialogue history without any model parameter …

Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications

K Kuhn, V Kersken, G Zimmermann - arXiv preprint arXiv:2408.15616, 2024 - arxiv.org
The Word Error Rate (WER) is the common measure of accuracy for Automatic Speech
Recognition (ASR). Transcripts are usually pre-processed by substituting specific characters …

Mi-Go: tool which uses YouTube as data source for evaluating general-purpose speech recognition machine learning models

T Wojnar, J Hryszko, A Roman - EURASIP Journal on Audio, Speech, and …, 2024 - Springer
Abstract This article introduces Mi-Go, a tool aimed at evaluating the performance and
adaptability of general-purpose speech recognition machine learning models across diverse …

Mi-Go: Test Framework which uses YouTube as Data Source for Evaluating Speech Recognition Models like OpenAI's Whisper

T Wojnar, J Hryszko, A Roman - arXiv preprint arXiv:2309.00329, 2023 - arxiv.org
This article introduces Mi-Go, a novel testing framework aimed at evaluating the
performance and adaptability of general-purpose speech recognition machine learning …

Language variation in teacher speech in a dual immersion preschool

X Zhang - Proceedings of the Linguistic Society of …, 2023 - journals.linguisticsociety.org
This study investigates the language input provided for English-Mandarin emergent
bilingual children in a California English-Mandarin dual immersion preschool. As illustrated …

First language dialect experience as a source of individual differences in consonant acquisition

ER Napoli - 2024 - rave.ohiolink.edu
High variability phonetic training (HVPT) is a training paradigm during which individuals are
exposed to second language (L2) input with relatively high levels of speaker and contextual …

Empirical Foundations of Socio-Indexical Structure: Inquiries in Corpus Sociophonetics and Perceptual Learning

K Gunter - 2023 - search.proquest.com
Speech is highly variable and systematic, governed by the internal linguistic system and
socio-indexical factors. The systematic relationship of socio-indexical factors and variable …

[图书][B] Language Variation in Dual Immersion Preschools: Teaching and Learning Mandarin Chinese as a Heritage Language

X Zhang - 2023 - search.proquest.com
This dissertation draws on both qualitative and quantitative approaches to investigate the
linguistic practices of teachers and children who are learning Mandarin Chinese as a …