Existing speech to speech translation systems heavily rely on the text of target language: they usually translate source language either to target text and then synthesize target …
Transformers have made significant strides across various artificial intelligence domains, including natural language processing, computer vision, and audio processing. This …
X Chen, Q Xie - Journal of Sensors, 2023 - Wiley Online Library
Safety helmets play a vital role in protecting workers' heads. In order to improve the accuracy of the detection model in complex environments, such as complex backgrounds and …
The high cost of data acquisition makes Automatic Speech Recognition (ASR) model training problematic for most existing languages, including languages that do not even have …
The idea of combining multiple languages' recordings to train a single automatic speech recognition (ASR) model brings the promise of the emergence of universal speech …
Text-based technologies, such as text translation from one language to another, and image captioning, are gaining popularity. However, approximately half of the world's languages are …
D Merkx, S Scholten, SL Frank, M Ernestus… - Cognitive …, 2023 - Springer
Many computational models of speech recognition assume that the set of target words is already given. This implies that these models learn to recognise speech in a biologically …
Keyword localisation is the task of finding where in a speech utterance a given query keyword occurs. We investigate to what extent keyword localisation is possible using a …
S Feng, M Tu, R Xia, C Huang, Y Wang - arXiv preprint arXiv:2305.11569, 2023 - arxiv.org
We improve low-resource ASR by integrating the ideas of multilingual training and self- supervised learning. Concretely, we leverage an International Phonetic Alphabet (IPA) …