Self-supervised acoustic word embedding learning via correspondence transformer encoder

J Lin, X Yue, J Ao, H Li - arXiv preprint arXiv:2307.09871, 2023 - arxiv.org
Acoustic word embeddings (AWEs) aims to map a variable-length speech segment into a
fixed-dimensional representation. High-quality AWEs should be invariant to variations, such …

Polyscriber: Integrated fine-tuning of extractor and lyrics transcriber for polyphonic music

X Gao, C Gupta, H Li - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
Lyrics transcription of polyphonic music is challenging as the background music affects lyrics
intelligibility. Typically, lyrics transcription can be performed by a two-step pipeline, ie a …

Automatic Lyric Transcription and Automatic Music Transcription from Multimodal Singing

X Gu, L Ou, W Zeng, J Zhang, N Wong… - ACM Transactions on …, 2024 - dl.acm.org
Automatic lyric transcription (ALT) refers to transcribing singing voices into lyrics, while
automatic music transcription (AMT) refers to transcribing singing voices into note events, ie …

Spotting parodies: Detecting alignment collapse between lyrics and singing voice

T Ariga, Y Higuchi, M Kanno, R Shigyo… - 2023 31st European …, 2023 - ieeexplore.ieee.org
We present a method for detecting parodies in karaoke singing by evaluating alignment
collapse between lyrics and singing voice. Parody detection is a crucial technique for …