Multimodal Word Discovery and Retrieval with Phone Sequence and Image Concepts.

A Rouditchenko, A Boggust, D Harwath, B Chen… - arXiv preprint arXiv …, 2020 - arxiv.org

Current methods for learning visually grounded language from videos often rely on text
annotation, such as human generated captions or machine generated automatic speech …

被引用次数：156 相关文章所有 9 个版本

[PDF] arxiv.org

Cross-modal discrete representation learning

AH Liu, SY Jin, CIJ Lai, A Rouditchenko, A Oliva… - arXiv preprint arXiv …, 2021 - arxiv.org

Recent advances in representation learning have demonstrated an ability to represent
information from different modalities such as video, text, and audio in a single high-level …

被引用次数：47 相关文章所有 10 个版本

[PDF] nsf.gov

A DNN-HMM-DNN hybrid model for discovering word-like units from spoken captions and image regions

L Wang, M Hasegawa-Johnson - Interspeech, 2020 - par.nsf.gov

Discovering word-like units without textual transcriptions is an important step in low-resource
speech technology. In this work, we demonstrate a model inspired by statistical machine …

被引用次数：12 相关文章所有 9 个版本

[PDF] nsf.gov

Multimodal word discovery and retrieval with spoken descriptions and visual concepts

L Wang, M Hasegawa-Johnson - IEEE/ACM Transactions on …, 2020 - ieeexplore.ieee.org

In the absence of dictionaries, translators, or grammars, it is still possible to learn some of
the words of a new language by listening to spoken descriptions of images. If several …

被引用次数：1 相关文章所有 4 个版本

[PDF] mit.edu

[PDF][PDF] Cross-Modal Discrete Representation Learning

AHLSYJ Cheng, IJLA Rouditchenko, AOJ Glass - olivalab.mit.edu

In contrast to recent advances focusing on highlevel representation learning across
modalities, in this work we present a self-supervised learning framework that is able to learn …

[PDF] mit.edu

Learning Audio-Video Language Representations

A Rouditchenko - 2021 - dspace.mit.edu

Automatic speech recognition has seen recent advancements powered by machine
learning, but it is still only available for a small fraction of the more than 7,000 languages …

高级搜索

QQ 群