Y Xie, Z Zhu, X Zhuang, L Liang, Z Wang… - Proc. Interspeech …, 2024 - pkusz.edu.cn
Abstract Recent Audio-Text Retrieval (ATR) models have achieved progressive results,
which pursue semantic interaction upon audio and text pairs. To clarify this coarse-grained …