X-vectors: Robust dnn embeddings for speaker recognition

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：150 相关文章所有 6 个版本

[PDF] researchgate.net

A survey on differential privacy for unstructured data content

Y Zhao, J Chen - ACM Computing Surveys (CSUR), 2022 - dl.acm.org

Huge amounts of unstructured data including image, video, audio, and text are ubiquitously
generated and shared, and it is a challenge to protect sensitive personal information in …

被引用次数：211 相关文章所有 4 个版本

[HTML] pianshen.com

[HTML][HTML] Self-attentive speaker embeddings for text-independent speaker verification.

Y Zhu, T Ko, D Snyder, B Mak, D Povey - Interspeech, 2018 - pianshen.com

摘要This paper introduces a new method to extract speaker embed-dings from a deep
neural network (DNN) for text-independent speaker verification. Usually, speaker …

被引用次数：302 相关文章所有 15 个版本

[PDF] thecvf.com

Ego4d: Around the world in 3,000 hours of egocentric video

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022 - openaccess.thecvf.com

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …

被引用次数：809 相关文章所有 13 个版本

[PDF] mlr.press

Yourtts: Towards zero-shot multi-speaker tts and zero-shot voice conversion for everyone

E Casanova, J Weber, CD Shulby… - International …, 2022 - proceedings.mlr.press

YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker
TTS. Our method builds upon the VITS model and adds several novel modifications for zero …

被引用次数：359 相关文章所有 7 个版本

[PDF] arxiv.org

SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arXiv preprint arXiv …, 2021 - arxiv.org

SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

被引用次数：653 相关文章所有 5 个版本

[PDF] arxiv.org

Superb: Speech processing universal performance benchmark

S Yang, PH Chi, YS Chuang, CIJ Lai… - arXiv preprint arXiv …, 2021 - arxiv.org

Self-supervised learning (SSL) has proven vital for advancing research in natural language
processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on …

被引用次数：860 相关文章所有 11 个版本

[PDF] ieee.org Full View

Icassp 2023 deep noise suppression challenge

H Dubey, A Aazami, V Gopal, B Naderi… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org

The ICASSP 2023 Deep Noise Suppression (DNS) Challenge marks the fifth edition of the
DNS challenge series. DNS challenges were organized from 2019 to 2023 to foster …

被引用次数：212 相关文章所有 14 个版本

[PDF] arxiv.org

Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification

B Desplanques, J Thienpondt, K Demuynck - arXiv preprint arXiv …, 2020 - arxiv.org

Current speaker verification techniques rely on a neural network to extract speaker
representations. The successful x-vector architecture is a Time Delay Neural Network …

被引用次数：1457 相关文章所有 15 个版本

[PDF] hal.science

ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection

J Yamagishi, X Wang, M Todisco… - ASVspoof 2021 …, 2021 - inria.hal.science

ASVspoof 2021 is the forth edition in the series of biannual challenges which aim to promote
the study of spoofing and the design of countermeasures to protect automatic speaker …

被引用次数：333 相关文章所有 11 个版本

高级搜索

QQ 群