Siamese neural networks: An overview

D Chicco - Artificial neural networks, 2021 - Springer
Similarity has always been a key aspect in computer science and statistics. Any time two
element vectors are compared, many different similarity approaches can be used …

Temporal modulations in speech and music

N Ding, AD Patel, L Chen, H Butler, C Luo… - … & Biobehavioral Reviews, 2017 - Elsevier
Speech and music have structured rhythms. Here we discuss a major acoustic correlate of
spoken and musical rhythms, the slow (0.25–32 Hz) temporal modulations in sound intensity …

[PDF][PDF] Montreal forced aligner: Trainable text-speech alignment using kaldi.

M McAuliffe, M Socolof, S Mihuc, M Wagner… - Interspeech, 2017 - isca-archive.org
Abstract We present the Montreal Forced Aligner (MFA), a new opensource system for
speech-text alignment. MFA is an update to the Prosodylab-Aligner, and maintains its key …

Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires

T Sainburg, M Thielk, TQ Gentner - PLoS computational biology, 2020 - journals.plos.org
Animals produce vocalizations that range in complexity from a single repeated call to
hundreds of unique vocal elements patterned in sequences unfolding over hours …

The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de) composition …

RH Baayen, YY Chuang, E Shafaei-Bajestan… - …, 2019 - Wiley Online Library
The discriminative lexicon is introduced as a mathematical and computational model of the
mental lexicon. This novel theory is inspired by word and paradigm morphology but …

A cross-language perspective on speech information rate

F Pellegrino, C Coupé, E Marsico - Language, 2011 - JSTOR
This article is a crosslinguistic investigation of the hypothesis that the average information
rate conveyed during speech communication results from a trade-off between average …

Augmented datasheets for speech datasets and ethical decision-making

O Papakyriakopoulos, ASG Choi, W Thong… - Proceedings of the …, 2023 - dl.acm.org
Speech datasets are crucial for training Speech Language Technologies (SLT); however,
the lack of diversity of the underlying training data can lead to serious limitations in building …

Why reduce? Phonological neighborhood density and phonetic reduction in spontaneous speech

S Gahl, Y Yao, K Johnson - Journal of memory and language, 2012 - Elsevier
Frequent or contextually predictable words are often phonetically reduced, ie shortened and
produced with articulatory undershoot. Explanations for phonetic reduction of predictable …

Real-life voice activity detection with lstm recurrent neural networks and an application to hollywood movies

F Eyben, F Weninger, S Squartini… - 2013 IEEE International …, 2013 - ieeexplore.ieee.org
A novel, data-driven approach to voice activity detection is presented. The approach is
based on Long Short-Term Memory Recurrent Neural Networks trained on standard RASTA …

Prosody in context: A review

J Cole - Language, Cognition and Neuroscience, 2015 - Taylor & Francis
Prosody conveys information about the linguistic context of an utterance at every level of
linguistic organisation, from the word up to the discourse context. Acoustic correlates of …