Target Speech Diarization with Multimodal Prompts

Y Jiang, R Tao, Z Chen, Y Qian, H Li - arXiv preprint arXiv:2406.07198, 2024 - arxiv.org
Traditional speaker diarization seeks to detect``who spoke when''according to speaker
characteristics. Extending to target speech diarization, we detect``when target event …

Cross-Modal Interaction via Reinforcement Feedback for Audio-Lyrics Retrieval

D Zhou, F Lei, L Li, Y Zhou… - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
The task of retrieving audio content relevant to lyric queries and vice versa plays a critical
role in music-oriented applications. In this process, robust feature representations have to be …

Speaker-Text Retrieval via Contrastive Learning

X Liu, X Wang, E Cooper, X Miao… - arXiv preprint arXiv …, 2023 - arxiv.org
In this study, we introduce a novel cross-modal retrieval task involving speaker descriptions
and their corresponding audio samples. Utilizing pre-trained speaker and text encoders, we …

Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval

Q Wang, JC Gu, ZH Ling - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Audio-text retrieval (ATR), which retrieves a relevant caption given an audio clip (A2T) and
vice versa (T2A), has recently attracted much research attention. Existing methods typically …

Cross-Modal Audio-Text Retrieval via Sequential Feature Augmentation

F Song, J Hu, C Wang, J Huang, H Zhang… - Proceedings of the 2023 …, 2023 - dl.acm.org
The goal of cross-modal audio-text retrieval is to retrieve the target audio clips (textual
descriptions), which should be relevant to a given textual (audial) query. It is a challenging …

Audio-Text Retrieval: Exploring Shared Parameters and Intra-Modal Constraint Loss

V Shah, Y Suryawanshi, S Randar… - … Conference on Advanced …, 2023 - Springer
Cross-modal retrieval involves retrieving information across diverse modalities, like image-
text, image-audio and audio-text. It finds application in multimedia search engines …

Multi-Modal Learning for Machine Listening Systems

HH Wu - 2023 - search.proquest.com
Sound is rich in information and thus crucial to computational perception. Machine listening
systems aim to recreate human perception and reasoning of sound. However, most state-of …