Disentangling voice and content with self-supervision for speaker recognition

T Liu, KA Lee, Q Wang, H Li - Advances in Neural …, 2023 - proceedings.neurips.cc
For speaker recognition, it is difficult to extract an accurate speaker representation from
speech because of its mixture of speaker traits and content. This paper proposes a …

Self-supervised speaker recognition with loss-gated learning

R Tao, KA Lee, RK Das… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
In self-supervised learning for speaker recognition, pseudo labels are useful as the
supervision signals. It is a known fact that a speaker recognition model doesn't always …

Automatic lyrics transcription of polyphonic music with lyrics-chord multi-task learning

X Gao, C Gupta, H Li - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
Lyrics are the words that make up a song, while chords are harmonic sets of multiple notes
in music. Lyrics and chords are generally essential information in music, ie unaccompanied …

Selective listening by synchronizing speech with lips

Z Pan, R Tao, C Xu, H Li - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
A speaker extraction algorithm seeks to extract the speech of a target speaker from a multi-
talker speech mixture when given a cue that represents the target speaker, such as a pre …

Refxvc: Cross-lingual voice conversion with enhanced reference leveraging

M Zhang, Y Zhou, Y Ren, C Zhang… - … /ACM Transactions on …, 2024 - ieeexplore.ieee.org
This paper proposes RefXVC, a method for cross-lingual voice conversion (XVC) that
leverages reference information to improve conversion performance. Previous XVC works …

Decoding knowledge transfer for neural text-to-speech training

R Liu, B Sisman, G Gao, H Li - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
Neural end-to-end text-to-speech (TTS) is superior to conventional statistical methods in
many ways. However, the exposure bias problem, that arises from the mismatch between …

Enhancing Automatic Speech Recognition With Personalized Models: Improving Accuracy Through Individualized Fine-Tuning

V Brydinskyi, D Sabodashko, Y Khoma… - IEEE …, 2024 - ieeexplore.ieee.org
Automatic speech recognition (ASR) systems have become increasingly popular in recent
years due to their ability to convert spoken language into text. Nonetheless, despite their …

Tts-guided training for accent conversion without parallel data

Y Zhou, Z Wu, M Zhang, X Tian… - IEEE Signal Processing …, 2023 - ieeexplore.ieee.org
Accent Conversion (AC) seeks to change the accent of speech from one (source) to another
(target) while preserving the speech content and speaker identity. However, many existing …

Optimization of cross-lingual voice conversion with linguistics losses to reduce foreign accents

Y Zhou, Z Wu, X Tian, H Li - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
Cross-lingual voice conversion (XVC) transforms the speaker identity of a source speaker to
that of a target speaker who speaks a different language. Due to the intrinsic differences …

A survey on Artificial Intelligent based solutions using Augmentative and Alternative Communication for Speech Disabled

B Evangeline - 2022 - researchsquare.com
This paper aims to analyse how innovative Artificial Intelligence (AI) systems for non-
standard speech recognition may revolutionize Augmentative Alternative Communication …