Beyond the status quo: A contemporary survey of advances and challenges in audio captioning

X Xu, Z Xie, M Wu, K Yu - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
Automated audio captioning (AAC), a task that mimics human perception as well as
innovatively links audio processing and natural language processing, has overseen much …

Cross-modal incongruity aligning and collaborating for multi-modal sarcasm detection

J Wang, Y Yang, Y Jiang, M Ma, Z Xie, T Li - Information Fusion, 2024 - Elsevier
Sarcasm embodies a linguistic phenomenon that highlights a significant incongruity
between the literal meanings of words and intended attitudes. With the proliferation of image …

How robust are audio embeddings for polyphonic sound event tagging?

J Abeßer, S Grollmisch, M Müller - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org
Sound classification algorithms are challenged by the natural variability of everyday sounds,
particularly for large sound class taxonomies. In order to be applicable in real-life …

Efficient Audio Captioning with Encoder-Level Knowledge Distillation

X Xu, H Liu, M Wu, W Wang, MD Plumbley - arXiv preprint arXiv …, 2024 - arxiv.org
Significant improvement has been achieved in automated audio captioning (AAC) with
recent models. However, these models have become increasingly large as their …

Generating Accurate and Diverse Audio Captions through Variational Autoencoder Framework

Y Zhang, R Du, ZH Tan, W Wang… - IEEE Signal Processing …, 2024 - ieeexplore.ieee.org
Generating both diverse and accurate descriptions is an essential goal in the audio
captioning task. Traditional methods mainly focus on improving the accuracy of the …

AFSDCGN: Adaptive Feature Scaling and Dynamic Contextual Graph Networks for image captioning with unseen relationship detection

YA Thakare, KH Walse, M Atique - Multimedia Tools and Applications, 2024 - Springer
Automated image captioning systems play a crucial role in various applications such as
assistive technologies, content indexing, and robotics. However, current frameworks face …

Zero-Shot Audio Captioning Using Soft and Hard Prompts

Y Zhang, X Xu, R Du, H Liu, Y Dong, ZH Tan… - arXiv preprint arXiv …, 2024 - arxiv.org
In traditional audio captioning methods, a model is usually trained in a fully supervised
manner using a human-annotated dataset containing audio-text pairs and then evaluated on …

EDTC: enhance depth of text comprehension in automated audio captioning

L Tan, Y Cao, Y Zhou - arXiv preprint arXiv:2402.17259, 2024 - arxiv.org
Modality discrepancies have perpetually posed significant challenges within the realm of
Automated Audio Captioning (AAC) and across all multi-modal domains. Facilitating models …