The ETSI extended distributed speech recognition (DSR) standards: server-side speech reconstruction

B Gfeller, C Frank, D Roblek, M Sharifi… - … on Audio, Speech …, 2020 - ieeexplore.ieee.org

We propose a model to estimate the fundamental frequency in monophonic audio, often
referred to as pitch estimation. We acknowledge the fact that obtaining ground truth …

被引用次数：106 相关文章所有 8 个版本

[PDF] academia.edu

Automatic speech recognition over error-prone wireless networks

ZH Tan, P Dalsgaard, B Lindberg - Speech Communication, 2005 - Elsevier

The past decade has witnessed a growing interest in deploying automatic speech
recognition (ASR) in communication networks. The networks such as wireless networks …

被引用次数：61 相关文章所有 11 个版本

[PDF] ieee.org

Musicyolo: A vision-based framework for automatic singing transcription

X Wang, B Tian, W Yang, W Xu… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org

Automatic singing transcription (AST), which refers to the process of inferring the onset,
offset, and pitch from the singing audio, is of great significance in music information retrieval …

被引用次数：8 相关文章所有 3 个版本

[PDF] wiley.com Full View

Speech magnitude spectrum reconstruction from MFCCs using deep neural network

W Jiang, P Liu, F Wen - Chinese Journal of Electronics, 2018 - Wiley Online Library

This work proposes a Deep neural network (DNN) based method for reconstructing speech
magnitude spectrum from Mel‐frequency cepstral coefficients (MFCCs). We train a DNN …

被引用次数：20 相关文章所有 8 个版本

Gio: A timbre-informed approach for pitch tracking in highly noisy environments

X Sun, X Liang, Q He, B Zhu, Z Ma - Proceedings of the 2022 …, 2022 - dl.acm.org

As one of the fundamental tasks in music and speech signal processing, pitch tracking has
been attracting attention for decades. While a human can focus on the voiced pitch even in …

被引用次数：4 相关文章

Adaptation of hidden Markov models for recognizing speech of reduced frame rate

LM Lee, FR Jean - IEEE Transactions on Cybernetics, 2013 - ieeexplore.ieee.org

The frame rate of the observation sequence in distributed speech recognition applications
may be reduced to suit a resource-limited front-end device. In order to use models trained …

被引用次数：25 相关文章所有 4 个版本

[PDF] mdpi.com

A Robust and Low Computational Cost Pitch Estimation Method

D Wang, Y Wei, Y Wang, J Wang - Sensors, 2022 - mdpi.com

Pitch estimation is widely used in speech and audio signal processing. However, the current
methods of modeling harmonic structure used for pitch estimation cannot always match the …

被引用次数：4 相关文章所有 9 个版本

Pitch estimation via self-supervision

B Gfeller, C Frank, D Roblek, M Sharifi… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

We present a method to estimate the fundamental frequency in monophonic audio, often
referred to as pitch estimation. In contrast to existing methods, our neural network can be …

被引用次数：8 相关文章

Voiced/unvoiced pattern-based duration modeling for language identification

B Yin, E Ambikairajah, F Chen - 2009 IEEE International …, 2009 - ieeexplore.ieee.org

Most existing duration modeling approaches facilitates phone recognizer and require
manually annotated corpus to train the segmentation models, which is usually cost-and time …

被引用次数：14 相关文章所有 5 个版本

[PDF] psu.edu

Robustness to transmission channel-the DSR approach

D Pearce - COST278 and ISCA Tutorial and Research Workshop …, 2004 - isca-speech.org

The desire for improved user interfaces for distributed speech and multimodal services on
mobile devices has motivated the need for reliable recognition performance over mobile …

被引用次数：15 相关文章所有 4 个版本

高级搜索

QQ 群