An overview of lead and accompaniment separation in music

Z Rafii, A Liutkus, FR Stöter, SI Mimilakis… - … on Audio, Speech …, 2018 - ieeexplore.ieee.org
Popular music is often composed of an accompaniment and a lead component, the latter
typically consisting of vocals. Filtering such mixtures to extract one or both components has …

[图书][B] Audio source separation and speech enhancement

E Vincent, T Virtanen, S Gannot - 2018 - books.google.com
Learn the technology behind hearing aids, Siri, and Echo Audio source separation and
speech enhancement aim to extract one or more source signals of interest from an audio …

Multichannel extensions of non-negative matrix factorization with complex-valued data

H Sawada, H Kameoka, S Araki… - IEEE Transactions on …, 2013 - ieeexplore.ieee.org
This paper presents new formulations and algorithms for multichannel extensions of non-
negative matrix factorization (NMF). The formulations employ Hermitian positive semidefinite …

Hawkes processes for events in social media

MA Rizoiu, Y Lee, S Mishra, L Xie - Frontiers of multimedia research, 2017 - dl.acm.org
This chapter provides an accessible introduction for point processes, and especially Hawkes
processes, for modeling discrete, inter-dependent events over continuous time. We start by …

Deep learning for video classification and captioning

Z Wu, T Yao, Y Fu, YG Jiang - Frontiers of multimedia research, 2017 - dl.acm.org
Deep learning for video classification and captioning Page 1 IPART MULTIMEDIA
CONTENT ANALYSIS Page 2 Page 3 1Deep Learning for Video Classification and …

Static and dynamic source separation using nonnegative factorizations: A unified view

P Smaragdis, C Fevotte, GJ Mysore… - IEEE Signal …, 2014 - ieeexplore.ieee.org
Source separation models that make use of nonnegativity in their parameters have been
gaining increasing popularity in the last few years, spawning a significant number of …

An unsupervised approach to cochannel speech separation

K Hu, DL Wang - IEEE Transactions on audio, speech, and …, 2012 - ieeexplore.ieee.org
Cochannel (two-talker) speech separation is predominantly addressed using pretrained
speaker dependent models. In this paper, we propose an unsupervised approach to …

A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics

GJ Mysore, P Smaragdis - 2011 IEEE International Conference …, 2011 - ieeexplore.ieee.org
We present a semi-supervised source separation methodology to denoise speech by
modeling speech as one source and noise as the other source. We model speech using the …

Deep polyphonic ADSR piano note transcription

R Kelz, S Böck, G Widmer - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
We investigate a late-fusion approach to piano transcription, combined with a strong
temporal prior in the form of a handcrafted Hidden Markov Model (HMM). The network …

Compositional models for audio processing: Uncovering the structure of sound mixtures

T Virtanen, JF Gemmeke, B Raj… - IEEE Signal Processing …, 2015 - ieeexplore.ieee.org
Many classes of data are composed as constructive combinations of parts. By constructive
combination, we mean additive combination that does not result in subtraction or …