A review of physical and perceptual feature extraction techniques for speech, music and environmental sounds

F Alías, JC Socoró, X Sevillano - Applied Sciences, 2016 - mdpi.com
Endowing machines with sensing capabilities similar to those of humans is a prevalent
quest in engineering and computer science. In the pursuit of making computers sense their …

[PDF][PDF] Music emotion recognition: A state of the art review

YE Kim, EM Schmidt, R Migneco, BG Morton… - Proc. ismir, 2010 - archives.ismir.net
This paper surveys the state of the art in automatic emotion recognition in music. Music is
oftentimes referred to as a “language of emotion”[1], and it is natural for us to categorize …

Text-to-audio generation using instruction-tuned llm and latent diffusion model

D Ghosal, N Majumder, A Mehrish, S Poria - arXiv preprint arXiv …, 2023 - arxiv.org
The immense scale of the recent large language models (LLM) allows many interesting
properties, such as, instruction-and chain-of-thought-based fine-tuning, that has significantly …

Panns: Large-scale pretrained audio neural networks for audio pattern recognition

Q Kong, Y Cao, T Iqbal, Y Wang… - … on Audio, Speech …, 2020 - ieeexplore.ieee.org
Audio pattern recognition is an important research topic in the machine learning area, and
includes several tasks such as audio tagging, acoustic scene classification, music …

Smil: Multimodal learning with severely missing modality

M Ma, J Ren, L Zhao, S Tulyakov, C Wu… - Proceedings of the AAAI …, 2021 - ojs.aaai.org
A common assumption in multimodal learning is the completeness of training data, ie, full
modalities are available in all training examples. Although there exists research endeavor in …

Mert: Acoustic music understanding model with large-scale self-supervised training

Y Li, R Yuan, G Zhang, Y Ma, X Chen, H Yin… - arXiv preprint arXiv …, 2023 - arxiv.org
Self-supervised learning (SSL) has recently emerged as a promising paradigm for training
generalisable models on large-scale data in the fields of vision, text, and speech. Although …

Trends in audio signal feature extraction methods

G Sharma, K Umapathy, S Krishnan - Applied Acoustics, 2020 - Elsevier
Audio signal processing algorithms generally involves analysis of signal, extracting its
properties, predicting its behaviour, recognizing if any pattern is present in the signal, and …

Deep forest

ZH Zhou, J Feng - National science review, 2019 - academic.oup.com
Current deep-learning models are mostly built upon neural networks, ie multiple layers of
parameterized differentiable non-linear modules that can be trained by backpropagation. In …

Singing voice separation with deep u-net convolutional networks

A Jansson, E Humphrey, N Montecchio, R Bittner… - 2017 - openaccess.city.ac.uk
The decomposition of a music audio signal into its vocal and backing track components is
analogous to image-to-image translation, where a mixed spectrogram is transformed into its …

FMA: A dataset for music analysis

M Defferrard, K Benzi, P Vandergheynst… - arXiv preprint arXiv …, 2016 - arxiv.org
We introduce the Free Music Archive (FMA), an open and easily accessible dataset suitable
for evaluating several tasks in MIR, a field concerned with browsing, searching, and …