Sound event detection: A tutorial

A Mesaros, T Heittola, T Virtanen… - IEEE Signal …, 2021 - ieeexplore.ieee.org
Imagine standing on a street corner in the city. With your eyes closed you can hear and
recognize a succession of sounds: cars passing by, people speaking, their footsteps when …

Big data analytics deep learning techniques and applications: A survey

HA Selmy, HK Mohamed, W Medhat - Information systems, 2024 - Elsevier
Deep learning (DL), as one of the most active machine learning research fields, has
achieved great success in numerous scientific and technological disciplines, including …

Wilds: A benchmark of in-the-wild distribution shifts

PW Koh, S Sagawa, H Marklund… - International …, 2021 - proceedings.mlr.press
Distribution shifts—where the training distribution differs from the test distribution—can
substantially degrade the accuracy of machine learning (ML) systems deployed in the wild …

Ml-superb: Multilingual speech universal performance benchmark

J Shi, D Berrebbi, W Chen, HL Chung, EP Hu… - arXiv preprint arXiv …, 2023 - arxiv.org
Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to
benchmark the performance of Self-Supervised Learning (SSL) models on various speech …

Css10: A collection of single speaker speech datasets for 10 languages

K Park, T Mulc - arXiv preprint arXiv:1903.11269, 2019 - arxiv.org
We describe our development of CSS10, a collection of single speaker speech datasets for
ten languages. It is composed of short audio clips from LibriVox audiobooks and their …

Novel speech recognition systems applied to forensics within child exploitation: Wav2vec2. 0 vs. whisper

JC Vásquez-Correa, A Álvarez Muniain - Sensors, 2023 - mdpi.com
The growth in online child exploitation material is a significant challenge for European Law
Enforcement Agencies (LEAs). One of the most important sources of such online information …

Asr is all you need: Cross-modal distillation for lip reading

T Afouras, JS Chung… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
The goal of this work is to train strong models for visual speech recognition without requiring
human annotated ground truth data. We achieve this by distilling from an Automatic Speech …

End-end speech-to-text translation with modality agnostic meta-learning

S Indurthi, H Han, NK Lakumarapu… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Collecting large amounts of data to train end-to-end Speech Translation (ST) models is more
difficult compared to the ASR and MT tasks. Previous studies have proposed the use of …

Open source automatic speech recognition for German

B Milde, A Köhn - Speech communication; 13th ITG …, 2018 - ieeexplore.ieee.org
High quality Automatic Speech Recognition (ASR) is a prerequisite for speech-based
applications and research. While state-of-the-art ASR software is freely available, the …

Inductive biases, pretraining and fine-tuning jointly account for brain responses to speech

J Millet, JR King - arXiv preprint arXiv:2103.01032, 2021 - arxiv.org
Our ability to comprehend speech remains, to date, unrivaled by deep learning models. This
feat could result from the brain's ability to fine-tune generic sound representations for speech …