查看文章

arxiv.org 中的 [PDF]

Diverse audio captioning via adversarial training

作者

Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D Plumbley, Wenwu Wang

发表日期

2022/5/23

研讨会论文

ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

页码范围

8882-8886

出版商

IEEE

简介

Audio captioning aims at generating natural language descriptions for audio clips automatically. Existing audio captioning models have shown promising improvement in recent years. However, these models are mostly trained via maximum likelihood estimation (MLE), which tends to make captions generic, simple and deterministic. As different people may describe an audio clip from different aspects using distinct words and grammars, we argue that an audio captioning system should have the ability to generate diverse captions for a fixed audio clip and across similar audio clips. To address this problem, we propose an adversarial training framework for audio captioning based on a conditional generative adversarial network (C-GAN), which aims at improving the naturalness and diversity of generated captions. Unlike processing data of continuous values in a classical GAN, a sentence is composed of discrete …

引用总数

被引用次数：26

2022202320249 9 8

学术搜索中的文章

Diverse audio captioning via adversarial training

X Mei, X Liu, J Sun, MD Plumbley, W Wang - ICASSP 2022-2022 IEEE International Conference on …, 2022

被引用次数：26 相关文章所有 9 个版本