Segmental contrastive predictive coding for unsupervised word segmentation

P Peng, D Harwath - arXiv preprint arXiv:2203.15081, 2022 - arxiv.org

We present a method for visually-grounded spoken term discovery. After training either a
HuBERT or wav2vec2. 0 model to associate spoken captions with natural images, we show …

被引用次数：38 相关文章所有 6 个版本

[PDF] arxiv.org

Phone-to-audio alignment without text: A semi-supervised approach

J Zhu, C Zhang, D Jurgens - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

The task of phone-to-audio alignment has many applications in speech research. Here we
introduce two Wav2Vec2-based models for both text-dependent and text-independent …

被引用次数：38 相关文章所有 5 个版本

[PDF] arxiv.org

Self-supervised language learning from raw audio: Lessons from the zero resource speech challenge

E Dunbar, N Hamilakis… - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org

Recent progress in self-supervised or unsupervised machine learning has opened the
possibility of building a full speech processing system from raw audio without using any …

被引用次数：18 相关文章所有 7 个版本

[PDF] arxiv.org

A brief overview of unsupervised neural speech representation learning

L Borgholt, JD Havtorn, J Edin, L Maaløe… - arXiv preprint arXiv …, 2022 - arxiv.org

Unsupervised representation learning for speech processing has matured greatly in the last
few years. Work in computer vision and natural language processing has paved the way, but …

被引用次数：9 相关文章所有 5 个版本

[PDF] mit.edu

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - Transactions of the …, 2024 - direct.mit.edu

Many self-supervised speech models (S3Ms) have been introduced over the last few years,
improving performance and data efficiency on various speech tasks. However, these …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Word segmentation on discovered phone units with dynamic programming and self-supervised scoring

H Kamper - IEEE/ACM Transactions on Audio, Speech, and …, 2022 - ieeexplore.ieee.org

Recent work on unsupervised speech segmentation has used self-supervised models with
phone and word segmentation modules that are trained jointly. This paper instead revisits …

被引用次数：27 相关文章所有 4 个版本

[PDF] arxiv.org

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - arXiv preprint arXiv:2307.00162, 2023 - arxiv.org

Many self-supervised speech models (S3Ms) have been introduced over the last few years,
producing performance and data efficiency improvements for a variety of speech tasks …

被引用次数：9 相关文章所有 3 个版本

[PDF] neurips.cc

Variable-rate hierarchical CPC leads to acoustic unit discovery in speech

S Cuervo, A Lancucki, R Marxer… - Advances in …, 2022 - proceedings.neurips.cc

The success of deep learning comes from its ability to capture the hierarchical structure of
data by learning high-level representations defined in terms of low-level ones. In this paper …

被引用次数：16 相关文章所有 9 个版本

[PDF] arxiv.org

Efficient transformers with dynamic token pooling

P Nawrot, J Chorowski, A Łańcucki… - arXiv preprint arXiv …, 2022 - arxiv.org

Transformers achieve unrivalled performance in modelling language, but remain inefficient
in terms of memory and time complexity. A possible remedy is to reduce the sequence …

被引用次数：14 相关文章所有 4 个版本

[PDF] arxiv.org

On compressing sequences for self-supervised speech models

Y Meng, HJ Chen, J Shi, S Watanabe… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

Compressing self-supervised models has become increasingly necessary, as self-
supervised models become larger. While previous approaches have primarily focused on …

被引用次数：15 相关文章所有 5 个版本

高级搜索

QQ 群