ACTUAL: Audio captioning with caption feature space regularization

X Xu, Z Xie, M Wu, K Yu - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org

Automated audio captioning (AAC), a task that mimics human perception as well as
innovatively links audio processing and natural language processing, has overseen much …

被引用次数：6 相关文章所有 2 个版本

Cross-modal incongruity aligning and collaborating for multi-modal sarcasm detection

J Wang, Y Yang, Y Jiang, M Ma, Z Xie, T Li - Information Fusion, 2024 - Elsevier

Sarcasm embodies a linguistic phenomenon that highlights a significant incongruity
between the literal meanings of words and intended attitudes. With the proliferation of image …

被引用次数：4 相关文章所有 2 个版本

[PDF] ieee.org

How robust are audio embeddings for polyphonic sound event tagging?

J Abeßer, S Grollmisch, M Müller - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org

Sound classification algorithms are challenged by the natural variability of everyday sounds,
particularly for large sound class taxonomies. In order to be applicable in real-life …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Efficient Audio Captioning with Encoder-Level Knowledge Distillation

X Xu, H Liu, M Wu, W Wang, MD Plumbley - arXiv preprint arXiv …, 2024 - arxiv.org

Significant improvement has been achieved in automated audio captioning (AAC) with
recent models. However, these models have become increasingly large as their …

Generating Accurate and Diverse Audio Captions through Variational Autoencoder Framework

Y Zhang, R Du, ZH Tan, W Wang… - IEEE Signal Processing …, 2024 - ieeexplore.ieee.org

Generating both diverse and accurate descriptions is an essential goal in the audio
captioning task. Traditional methods mainly focus on improving the accuracy of the …

AFSDCGN: Adaptive Feature Scaling and Dynamic Contextual Graph Networks for image captioning with unseen relationship detection

YA Thakare, KH Walse, M Atique - Multimedia Tools and Applications, 2024 - Springer

Automated image captioning systems play a crucial role in various applications such as
assistive technologies, content indexing, and robotics. However, current frameworks face …

Zero-Shot Audio Captioning Using Soft and Hard Prompts

Y Zhang, X Xu, R Du, H Liu, Y Dong, ZH Tan… - arXiv preprint arXiv …, 2024 - arxiv.org

In traditional audio captioning methods, a model is usually trained in a fully supervised
manner using a human-annotated dataset containing audio-text pairs and then evaluated on …

EDTC: enhance depth of text comprehension in automated audio captioning

L Tan, Y Cao, Y Zhou - arXiv preprint arXiv:2402.17259, 2024 - arxiv.org

Modality discrepancies have perpetually posed significant challenges within the realm of
Automated Audio Captioning (AAC) and across all multi-modal domains. Facilitating models …

高级搜索

QQ 群