L Tan,
Y Cao, Y Zhou - arXiv preprint arXiv:2402.17259, 2024 - arxiv.org
Modality discrepancies have perpetually posed significant challenges within the realm of
Automated Audio Captioning (AAC) and across all multi-modal domains. Facilitating models …