Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

Machine Learning, Deep Learning and Statistical Analysis for forecasting building energy consumption—A systematic review

M Khalil, AS McGough, Z Pourmirza… - … Applications of Artificial …, 2022 - Elsevier
The building sector accounts for 36% of the total global energy usage and 40% of
associated Carbon Dioxide emissions. Therefore, the forecasting of building energy …

Self-supervised learning for time series analysis: Taxonomy, progress, and prospects

K Zhang, Q Wen, C Zhang, R Cai, M Jin… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Self-supervised learning (SSL) has recently achieved impressive performance on various
time series tasks. The most prominent advantage of SSL is that it reduces the dependence …

Seen and unseen emotional style transfer for voice conversion with a new emotional speech dataset

K Zhou, B Sisman, R Liu, H Li - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Emotional voice conversion aims to transform emotional prosody in speech while preserving
the linguistic content and speaker identity. Prior studies show that it is possible to …

Emotional voice conversion: Theory, databases and ESD

K Zhou, B Sisman, R Liu, H Li - Speech Communication, 2022 - Elsevier
In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …

Speech technology for healthcare: Opportunities, challenges, and state of the art

S Latif, J Qadir, A Qayyum, M Usama… - IEEE Reviews in …, 2020 - ieeexplore.ieee.org
Speech technology is not appropriately explored even though modern advances in speech
technology—especially those driven by deep learning (DL) technology—offer …

Towards learning a universal non-semantic representation of speech

J Shor, A Jansen, R Maor, O Lang, O Tuval… - arXiv preprint arXiv …, 2020 - arxiv.org
The ultimate goal of transfer learning is to reduce labeled data requirements by exploiting a
pre-existing embedding model trained for different datasets or tasks. The visual and …

Conventional and contemporary approaches used in text to speech synthesis: A review

N Kaur, P Singh - Artificial Intelligence Review, 2023 - Springer
Nowadays speech synthesis or text to speech (TTS), an ability of system to produce human
like natural sounding voice from the written text, is gaining popularity in the field of speech …

Beyond just vision: A review on self-supervised representation learning on multimodal and temporal data

S Deldari, H Xue, A Saeed, J He, DV Smith… - arXiv preprint arXiv …, 2022 - arxiv.org
Recently, Self-Supervised Representation Learning (SSRL) has attracted much attention in
the field of computer vision, speech, natural language processing (NLP), and recently, with …

Multi-channel spectrograms for speech processing applications using deep learning methods

T Arias-Vergara, P Klumpp, JC Vasquez-Correa… - Pattern Analysis and …, 2021 - Springer
Time–frequency representations of the speech signals provide dynamic information about
how the frequency component changes with time. In order to process this information, deep …