A review of deep learning techniques in audio event recognition (AER) applications

A Prashanth, SL Jayalakshmi… - Multimedia Tools and …, 2024 - Springer
In our day-to-day life, observation of human and social actions are highly important for public
protection and security. Additionally, identifying suspicious activity is also essential in critical …

Pvass-mdd: predictive visual-audio alignment self-supervision for multimodal deepfake detection

Y Yu, X Liu, R Ni, S Yang, Y Zhao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Deepfake techniques can forge the visual or audio signals in the video, which leads to
inconsistencies between visual and audio (VA) signals. Therefore, multimodal detection …

Generating music with emotions

C Bao, Q Sun - IEEE Transactions on Multimedia, 2022 - ieeexplore.ieee.org
We focus on the music generation conditional on human emotions, specifically the positive
and negative emotions. There is no existing large-scale music datasets with the annotation …

Multi-task learning-based spoofing-robust automatic speaker verification system

Y Zhao, R Togneri, V Sreeram - Circuits, Systems, and Signal Processing, 2022 - Springer
Spoofing attacks posed by generating artificial speech can severely degrade the
performance of a speaker verification system. Recently, many anti-spoofing …

Chord-based music generation using long short-term memory neural networks in the context of artificial intelligence

F Li - The Journal of Supercomputing, 2024 - Springer
With the rapid development of artificial intelligence (AI), music generation has gained
widespread attention. Long short-term memory (LSTM) has advantages in handling time …

Neural synthesis of sound effects using flow-based deep generative models

S Andreu, MV Aylagas - Proceedings of the AAAI Conference on …, 2022 - ojs.aaai.org
Creating variations of sound effects for video games is a time-consuming task that grows
with the size and complexity of the games themselves. The process usually comprises …

Replay anti-spoofing countermeasure based on data augmentation with post selection

Y Zhao, R Togneri, V Sreeram - Computer Speech & Language, 2020 - Elsevier
Abstract Automatic Speaker Verification (ASV) systems have been widely applied for
speaker authentication for biometric security especially in e-business scenarios. However …

Gotta Hear Them All: Sound Source Aware Vision to Audio Generation

W Guo, H Wang, W Cai, J Ma - arXiv preprint arXiv:2411.15447, 2024 - arxiv.org
Vision-to-audio (V2A) synthesis has broad applications in multimedia. Recent
advancements of V2A methods have made it possible to generate relevant audios from …

Machine and Deep Learning Methods for Predicting Immune Checkpoint Blockade Response

D Ho, M Motani - Machine Learning for Health, 2022 - proceedings.mlr.press
Immune checkpoint blockade (ICB) therapy has improved treatment options in various
cancer malignancies and holds promise for increasing the overall survival of treated …

A survey of deep learning audio generation methods

M Božić, M Horvat - arXiv preprint arXiv:2406.00146, 2024 - arxiv.org
This article presents a review of typical techniques used in three distinct aspects of deep
learning model development for audio generation. In the first part of the article, we provide …