Recent advances in representation learning have demonstrated an ability to represent information from different modalities such as video, text, and audio in a single high-level …
Discovering word-like units without textual transcriptions is an important step in low-resource speech technology. In this work, we demonstrate a model inspired by statistical machine …
In the absence of dictionaries, translators, or grammars, it is still possible to learn some of the words of a new language by listening to spoken descriptions of images. If several …
In contrast to recent advances focusing on highlevel representation learning across modalities, in this work we present a self-supervised learning framework that is able to learn …
Automatic speech recognition has seen recent advancements powered by machine learning, but it is still only available for a small fraction of the more than 7,000 languages …