Integrating frame-level boundary detection and deepfake detection for locating manipulated regions in partially spoofed audio forgery attacks

Z Cai, M Li - Computer Speech & Language, 2024 - Elsevier
Partially fake audio, a variant of deep fake that involves manipulating audio utterances
through the incorporation of fake or externally-sourced bona fide audio clips, constitutes a …

PMMTalk Speech-Driven 3D Facial Animation from Complementary Pseudo Multi-modal Features

T Han, S Gui, Y Huang, B Li, L Liu… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Speech-driven 3D facial animation has improved a lot recently while most related works only
utilize acoustic modality and neglect the influence of visual and textual cues, leading to …

Incorporating Speaker's Speech Rate Features for Improved Voice Cloning

Q Zhe, I Katunobu - 2023 9th International Conference on …, 2023 - ieeexplore.ieee.org
We investigate a neural network-based text-to-speech (TTS) synthesis system that aims to
simulate the Mandarin voice of different speakers using short voice samples. Our system …

Advancing Deep-Generated Speech and Defending against Its Misuse

Z Cai - 2023 - search.proquest.com
Deep learning has revolutionized speech generation, spanning synthesis areas such as text-
to-speech and voice conversion, leading to diverse advancements. On the one hand, when …