Cross-lingual word embeddings for low-resource language modeling

O Adams, A Makarucha, G Neubig… - Proceedings of the …, 2017 - aclanthology.org
Most languages have no established writing system and minimal written records. However,
textual data is essential for natural language processing, and particularly important for …

Integrating automatic transcription into the language documentation workflow: Experiments with Na data and the Persephone toolkit

A Michaud, O Adams, TA Cohn, G Neubig… - 2018 - scholarspace.manoa.hawaii.edu
Automatic speech recognition tools have potential for facilitating language documentation,
but in practice these tools remain little-used by linguists for a variety of reasons, such as that …

[HTML][HTML] Advances in subword-based HMM-DNN speech recognition across languages

P Smit, S Virpioja, M Kurimo - Computer Speech & Language, 2021 - Elsevier
We describe a novel way to implement subword language models in speech recognition
systems based on weighted finite state transducers, hidden Markov models, and deep …

Automatic speech recognition for supporting endangered language documentation

E Prud'hommeaux, R Jimerson, R Hatcher… - 2021 - scholarspace.manoa.hawaii.edu
Generating accurate word-level transcripts of recorded speech for language documentation
is difficult and time-consuming, even for skilled speakers of the target language. Automatic …

Analytical review of methods for solving data scarcity issues regarding elaboration of automatic speech recognition systems for low-resource languages

IS Kipyatkova, IA Kagirov - Informatics and Automation, 2022 - journals.rcsi.science
In this paper, principal methods for solving training data issues for the so-called low-
resource languages are discussed, regarding elaboration of automatic speech recognition …

Automatic construction of the Finnish parliament speech corpus

A Mansikkaniemi, P Smit, M Kurimo - INTERSPEECH, 2017 - research.aalto.fi
Automatic speech recognition (ASR) systems require large amounts of transcribed speech
data, for training state-of-the-art deep neural network (DNN) acoustic models. Transcribed …

Automatic speech recognition with very large conversational finnish and estonian vocabularies

S Enarvi, P Smit, S Virpioja… - IEEE/ACM Transactions …, 2017 - ieeexplore.ieee.org
Today, the vocabulary size for language models in large vocabulary speech recognition is
typically several hundreds of thousands of words. While this is already sufficient in some …

Automatic transcription challenges for Inuktitut, a low-resource polysynthetic language

V Gupta, G Boulianne - … of the Twelfth Language Resources and …, 2020 - aclanthology.org
We introduce the first attempt at automatic speech recognition (ASR) in Inuktitut, as a
representative for polysynthetic, low-resource languages, like many of the 900 Indigenous …

Entropy-argumentative concept of computational phonetic analysis of speech taking into account dialect and individuality of phonation

V Kovtun, O Kovtun, A Semenov - Entropy, 2022 - mdpi.com
In this article, the concept (ie, the mathematical model and methods) of computational
phonetic analysis of speech with an analytical description of the phenomenon of phonetic …

Аналитический обзор методов решения проблемы малых наборов данных при создании систем автоматического распознавания речи для малоресурсных …

ИС Кипяткова, ИА Кагиров - Информатика и автоматизация, 2022 - mathnet.ru
В статье рассматриваются основные методы решения проблемы малых наборов
обучающих данных для создания автоматических систем распознавания речи для так …