T Hu, X Xiang, J Qin, Y Tan - Multimedia Systems, 2023 - Springer
Existing research on audio–text retrieval is limited by the size of the dataset and the structure
of the network, making it difficult to learn the ideal features of audio and text resulting in low …