Speech technology for unwritten languages

Visually grounded models of spoken language: A survey of datasets, architectures and evaluation techniques

G Chrupała - Journal of Artificial Intelligence Research, 2022 - jair.org

This survey provides an overview of the evolution of visually grounded models of spoken
language over the last 20 years. Such models are inspired by the observation that when …

被引用次数：51 相关文章所有 9 个版本

[PDF] aaai.org

Uwspeech: Speech to speech translation for unwritten languages

C Zhang, X Tan, Y Ren, T Qin, K Zhang… - Proceedings of the AAAI …, 2021 - ojs.aaai.org

Existing speech to speech translation systems heavily rely on the text of target language:
they usually translate source language either to target text and then synthesize target …

被引用次数：58 相关文章所有 4 个版本

[PDF] arxiv.org

Survey: Transformer-based Models in Data Modality Conversion

E Rashno, A Eskandari, A Anand… - arXiv preprint arXiv …, 2024 - arxiv.org

Transformers have made significant strides across various artificial intelligence domains,
including natural language processing, computer vision, and audio processing. This …

被引用次数：1 相关文章所有 2 个版本

[PDF] wiley.com Full View

Safety Helmet‐Wearing Detection System for Manufacturing Workshop Based on Improved YOLOv7

X Chen, Q Xie - Journal of Sensors, 2023 - Wiley Online Library

Safety helmets play a vital role in protecting workers' heads. In order to improve the accuracy
of the detection model in complex environments, such as complex backgrounds and …

被引用次数：14 相关文章所有 6 个版本

[PDF] sciencedirect.com

Discovering phonetic inventories with crosslingual automatic speech recognition

P Żelasko, S Feng, LM Velazquez, A Abavisani… - Computer Speech & …, 2022 - Elsevier

The high cost of data acquisition makes Automatic Speech Recognition (ASR) model
training problematic for most existing languages, including languages that do not even have …

被引用次数：16 相关文章所有 8 个版本

[PDF] arxiv.org

How phonotactics affect multilingual and zero-shot asr performance

S Feng, P Żelasko, L Moro-Velázquez… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

The idea of combining multiple languages' recordings to train a single automatic speech
recognition (ASR) model brings the promise of the emergence of universal speech …

被引用次数：20 相关文章所有 12 个版本

Generating images from spoken descriptions

X Wang, T Qiao, J Zhu, A Hanjalic… - … /ACM Transactions on …, 2021 - ieeexplore.ieee.org

Text-based technologies, such as text translation from one language to another, and image
captioning, are gaining popularity. However, approximately half of the world's languages are …

被引用次数：21 相关文章所有 4 个版本

[PDF] springer.com

Modelling human word learning and recognition using visually grounded speech

D Merkx, S Scholten, SL Frank, M Ernestus… - Cognitive …, 2023 - Springer

Many computational models of speech recognition assume that the set of target words is
already given. This implies that these models learn to recognise speech in a biologically …

被引用次数：14 相关文章所有 12 个版本

[PDF] arxiv.org

Keyword localisation in untranscribed speech using visually grounded speech models

K Olaleye, D Oneaţă, H Kamper - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org

Keyword localisation is the task of finding where in a speech utterance a given query
keyword occurs. We investigate to what extent keyword localisation is possible using a …

被引用次数：10 相关文章所有 5 个版本

[PDF] arxiv.org

Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition

S Feng, M Tu, R Xia, C Huang, Y Wang - arXiv preprint arXiv:2305.11569, 2023 - arxiv.org

We improve low-resource ASR by integrating the ideas of multilingual training and self-
supervised learning. Concretely, we leverage an International Phonetic Alphabet (IPA) …

被引用次数：4 相关文章所有 10 个版本

高级搜索

QQ 群