[HTML][HTML] Automated audio captioning: An overview of recent progress and new challenges

X Mei, X Liu, MD Plumbley, W Wang - … journal on audio, speech, and music …, 2022 - Springer
Automated audio captioning is a cross-modal translation task that aims to generate natural
language descriptions for given audio clips. This task has received increasing attention with …

A comprehensive review of polyphonic sound event detection

TK Chan, CS Chin - IEEE Access, 2020 - ieeexplore.ieee.org
One of the most amazing functions of the human auditory system is the ability to detect all
kinds of sound events in the environment. With the technologies and hardware advances …

Weakly-supervised sound event detection with self-attention

K Miyazaki, T Komatsu, T Hayashi… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
In this paper, we propose a novel sound event detection (SED) method that incorporates a
self-attention mechanism of the Transformer for a weakly-supervised learning scenario. The …

Towards duration robust weakly supervised sound event detection

H Dinkel, M Wu, K Yu - IEEE/ACM Transactions on Audio …, 2021 - ieeexplore.ieee.org
Sound event detection (SED) is the task of tagging the absence or presence of audio events
and their corresponding interval within a given audio clip. While SED can be done using …

[HTML][HTML] Improving smart cities safety using sound events detection based on deep neural network algorithms

G Ciaburro, G Iannace - Informatics, 2020 - mdpi.com
In recent years, security in urban areas has gradually assumed a central position, focusing
increasing attention on citizens, institutions and political forces. Security problems have a …

An intelligent system for grinding wheel condition monitoring based on machining sound and deep learning

CH Lee, JS Jwo, HY Hsieh, CS Lin - IEEE Access, 2020 - ieeexplore.ieee.org
Immediate monitoring of the conditions of the grinding wheel during the grinding process is
important because it directly affects the surface accuracy of the workpiece. Because the …

Voice activity detection in the wild: A data-driven approach using teacher-student training

H Dinkel, S Wang, X Xu, M Wu… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org
Voice activity detection is an essential pre-processing component for speech-related tasks
such as automatic speech recognition (ASR). Traditional supervised VAD systems obtain …

VoiceFixer: Toward general speech restoration with neural vocoder

H Liu, Q Kong, Q Tian, Y Zhao, DL Wang… - arXiv preprint arXiv …, 2021 - arxiv.org
Speech restoration aims to remove distortions in speech signals. Prior methods mainly focus
on single-task speech restoration (SSR), such as speech denoising or speech declipping …

Audio caption: Listen and tell

M Wu, H Dinkel, K Yu - ICASSP 2019-2019 IEEE International …, 2019 - ieeexplore.ieee.org
Increasing amount of research has shed light on machine perception of audio events, most
of which concerns detection and classification tasks. However, human-like perception of …

Attention-based atrous convolutional neural networks: Visualisation and understanding perspectives of acoustic scenes

Z Ren, Q Kong, J Han, MD Plumbley… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
The goal of Acoustic Scene Classification (ASC) is to recognise the environment in which an
audio waveform has been recorded. Recently, deep neural networks have been applied to …