A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

Deep learning-based electroencephalography analysis: a systematic review

Y Roy, H Banville, I Albuquerque… - Journal of neural …, 2019 - iopscience.iop.org
Context. Electroencephalography (EEG) is a complex signal and can require several years
of training, as well as advanced signal processing and feature extraction methodologies to …

Dataset condensation with gradient matching

B Zhao, KR Mopuri, H Bilen - arXiv preprint arXiv:2006.05929, 2020 - arxiv.org
As the state-of-the-art machine learning methods in many fields rely on larger datasets,
storing datasets and training models on them become significantly more expensive. This …

Applications of artificial intelligence in dentistry: A comprehensive review

F Carrillo‐Perez, OE Pecho, JC Morales… - Journal of Esthetic …, 2022 - Wiley Online Library
Objective To perform a comprehensive review of the use of artificial intelligence (AI) and
machine learning (ML) in dentistry, providing the community with a broad insight on the …

Gigaspeech: An evolving, multi-domain asr corpus with 10,000 hours of transcribed audio

G Chen, S Chai, G Wang, J Du, WQ Zhang… - arXiv preprint arXiv …, 2021 - arxiv.org
This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition
corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and …

Artificial intelligence and the modern productivity paradox

E Brynjolfsson, D Rock, C Syverson - The economics of artificial …, 2019 - degruyter.com
In this chapter, we review the evidence and explanations for the modern productivity
paradox and propose a resolution. Namely, there is no inherent inconsistency between …

CHiME-6 challenge: Tackling multispeaker speech recognition for unsegmented recordings

S Watanabe, M Mandel, J Barker, E Vincent… - arXiv preprint arXiv …, 2020 - arxiv.org
Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the
6th CHiME Speech Separation and Recognition Challenge (CHiME-6). The new challenge …

Opportunities and obstacles for deep learning in biology and medicine

T Ching, DS Himmelstein… - Journal of the …, 2018 - royalsocietypublishing.org
Deep learning describes a class of machine learning algorithms that are capable of
combining raw inputs into layers of intermediate features. These algorithms have recently …

Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks

M Kolbæk, D Yu, ZH Tan… - IEEE/ACM Transactions on …, 2017 - ieeexplore.ieee.org
In this paper, we propose the utterance-level permutation invariant training (uPIT) technique.
uPIT is a practically applicable, end-to-end, deep-learning-based solution for speaker …

The Microsoft 2017 conversational speech recognition system

W Xiong, L Wu, F Alleva, J Droppo… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
We describe the latest version of Microsoft's conversational speech recognition system for
the Switchboard and CallHome domains. The system adds a CNN-BLSTM acoustic model to …