SPICE: Self-supervised pitch estimation

B Gfeller, C Frank, D Roblek, M Sharifi… - … on Audio, Speech …, 2020 - ieeexplore.ieee.org
We propose a model to estimate the fundamental frequency in monophonic audio, often
referred to as pitch estimation. We acknowledge the fact that obtaining ground truth …

Automatic speech recognition over error-prone wireless networks

ZH Tan, P Dalsgaard, B Lindberg - Speech Communication, 2005 - Elsevier
The past decade has witnessed a growing interest in deploying automatic speech
recognition (ASR) in communication networks. The networks such as wireless networks …

Musicyolo: A vision-based framework for automatic singing transcription

X Wang, B Tian, W Yang, W Xu… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org
Automatic singing transcription (AST), which refers to the process of inferring the onset,
offset, and pitch from the singing audio, is of great significance in music information retrieval …

Speech magnitude spectrum reconstruction from MFCCs using deep neural network

W Jiang, P Liu, F Wen - Chinese Journal of Electronics, 2018 - Wiley Online Library
This work proposes a Deep neural network (DNN) based method for reconstructing speech
magnitude spectrum from Mel‐frequency cepstral coefficients (MFCCs). We train a DNN …

Gio: A timbre-informed approach for pitch tracking in highly noisy environments

X Sun, X Liang, Q He, B Zhu, Z Ma - Proceedings of the 2022 …, 2022 - dl.acm.org
As one of the fundamental tasks in music and speech signal processing, pitch tracking has
been attracting attention for decades. While a human can focus on the voiced pitch even in …

Adaptation of hidden Markov models for recognizing speech of reduced frame rate

LM Lee, FR Jean - IEEE Transactions on Cybernetics, 2013 - ieeexplore.ieee.org
The frame rate of the observation sequence in distributed speech recognition applications
may be reduced to suit a resource-limited front-end device. In order to use models trained …

A Robust and Low Computational Cost Pitch Estimation Method

D Wang, Y Wei, Y Wang, J Wang - Sensors, 2022 - mdpi.com
Pitch estimation is widely used in speech and audio signal processing. However, the current
methods of modeling harmonic structure used for pitch estimation cannot always match the …

Pitch estimation via self-supervision

B Gfeller, C Frank, D Roblek, M Sharifi… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
We present a method to estimate the fundamental frequency in monophonic audio, often
referred to as pitch estimation. In contrast to existing methods, our neural network can be …

Voiced/unvoiced pattern-based duration modeling for language identification

B Yin, E Ambikairajah, F Chen - 2009 IEEE International …, 2009 - ieeexplore.ieee.org
Most existing duration modeling approaches facilitates phone recognizer and require
manually annotated corpus to train the segmentation models, which is usually cost-and time …

Robustness to transmission channel-the DSR approach

D Pearce - COST278 and ISCA Tutorial and Research Workshop …, 2004 - isca-speech.org
The desire for improved user interfaces for distributed speech and multimodal services on
mobile devices has motivated the need for reliable recognition performance over mobile …