- 学术资源搜索

Siamese neural networks: An overview

D Chicco - Artificial neural networks, 2021 - Springer

Similarity has always been a key aspect in computer science and statistics. Any time two
element vectors are compared, many different similarity approaches can be used …

被引用次数：624 相关文章所有 7 个版本

[PDF] mpg.de

Temporal modulations in speech and music

N Ding, AD Patel, L Chen, H Butler, C Luo… - … & Biobehavioral Reviews, 2017 - Elsevier

Speech and music have structured rhythms. Here we discuss a major acoustic correlate of
spoken and musical rhythms, the slow (0.25–32 Hz) temporal modulations in sound intensity …

被引用次数：469 相关文章所有 8 个版本

[PDF] isca-archive.org

[PDF][PDF] Montreal forced aligner: Trainable text-speech alignment using kaldi.

M McAuliffe, M Socolof, S Mihuc, M Wagner… - Interspeech, 2017 - isca-archive.org

Abstract We present the Montreal Forced Aligner (MFA), a new opensource system for
speech-text alignment. MFA is an update to the Prosodylab-Aligner, and maintains its key …

被引用次数：1265 相关文章所有 12 个版本

[PDF] plos.org

Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires

T Sainburg, M Thielk, TQ Gentner - PLoS computational biology, 2020 - journals.plos.org

Animals produce vocalizations that range in complexity from a single repeated call to
hundreds of unique vocal elements patterned in sequences unfolding over hours …

被引用次数：206 相关文章所有 16 个版本

[PDF] wiley.com Full View

The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de) composition …

RH Baayen, YY Chuang, E Shafaei-Bajestan… - …, 2019 - Wiley Online Library

The discriminative lexicon is introduced as a mathematical and computational model of the
mental lexicon. This novel theory is inspired by word and paradigm morphology but …

被引用次数：225 相关文章所有 26 个版本

[PDF] researchgate.net

A cross-language perspective on speech information rate

F Pellegrino, C Coupé, E Marsico - Language, 2011 - JSTOR

This article is a crosslinguistic investigation of the hypothesis that the average information
rate conveyed during speech communication results from a trade-off between average …

被引用次数：428 相关文章所有 22 个版本

[PDF] acm.org

Augmented datasheets for speech datasets and ethical decision-making

O Papakyriakopoulos, ASG Choi, W Thong… - Proceedings of the …, 2023 - dl.acm.org

Speech datasets are crucial for training Speech Language Technologies (SLT); however,
the lack of diversity of the underlying training data can lead to serious limitations in building …

被引用次数：16 相关文章所有 5 个版本

[PDF] polyu.edu.hk

Why reduce? Phonological neighborhood density and phonetic reduction in spontaneous speech

S Gahl, Y Yao, K Johnson - Journal of memory and language, 2012 - Elsevier

Frequent or contextually predictable words are often phonetically reduced, ie shortened and
produced with articulatory undershoot. Explanations for phonetic reduction of predictable …

被引用次数：335 相关文章所有 14 个版本

[PDF] tum.de

Real-life voice activity detection with lstm recurrent neural networks and an application to hollywood movies

F Eyben, F Weninger, S Squartini… - 2013 IEEE International …, 2013 - ieeexplore.ieee.org

A novel, data-driven approach to voice activity detection is presented. The approach is
based on Long Short-Term Memory Recurrent Neural Networks trained on standard RASTA …

被引用次数：287 相关文章所有 9 个版本

[PDF] researchgate.net

Prosody in context: A review

J Cole - Language, Cognition and Neuroscience, 2015 - Taylor & Francis

Prosody conveys information about the linguistic context of an utterance at every level of
linguistic organisation, from the word up to the discourse context. Acoustic correlates of …

被引用次数：233 相关文章所有 8 个版本

高级搜索

QQ 群