Jointly Optimizing Translations and Speech Timing to Improve Isochrony in Automatic Dubbing

A Chronopoulou, B Thompson, P Mathur… - arXiv preprint arXiv …, 2023 - arxiv.org
Automatic dubbing (AD) is the task of translating the original speech in a video into target
language speech. The new target language speech should satisfy isochrony; that is, the new …

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

C Le, Y Qian, D Wang, L Zhou, S Liu, X Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
There is a rising interest and trend in research towards directly translating speech from one
language to another, known as end-to-end speech-to-speech translation. However, most …

Biomedical named entity recognition based on multi-cross attention feature fusion

D Zheng, R Han, F Yu, Y Li - Plos one, 2024 - journals.plos.org
Currently, in the field of biomedical named entity recognition, CharCNN (Character-level
Convolutional Neural Networks) or CharRNN (Character-level Recurrent Neural Network) is …

Anim-400K: A Large-Scale Dataset for Automated End to End Dubbing of Video

K Cai, C Liu, DM Chan - ICASSP 2024-2024 IEEE International …, 2024 - ieeexplore.ieee.org
The Internet's wealth of content, with up to 60% published in English, starkly contrasts the
global population, where only 18.8% are English speakers, and just 5.1% consider it their …

SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing

L Xiong, X Cheng, J Tan, X Wu, X Li, L Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org
Audio-driven talking face generation aims to synthesize video with lip movements
synchronized to input audio. However, current generative techniques face challenges in …

Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning

SR Mhaskar, NJ Shah, M Zaki, AP Gudmalwar… - arXiv preprint arXiv …, 2024 - arxiv.org
Traditional Automatic Video Dubbing (AVD) pipeline consists of three key modules, namely,
Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), and Text-to …

DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing

N Sahipjohn, A Gudmalwar, N Shah, P Wasnik… - arXiv preprint arXiv …, 2024 - arxiv.org
Audio-visual alignment after dubbing is a challenging research problem. To this end, we
propose a novel method, DubWise Multi-modal Large Language Model (LLM)-based Text-to …

Improving isochronous machine translation with target factors and auxiliary counters

P Pal, B Thompson, Y Virkar, P Mathur… - arXiv preprint arXiv …, 2023 - arxiv.org
To translate speech for automatic dubbing, machine translation needs to be isochronous, ie
translated speech needs to be aligned with the source in terms of speech durations. We …

Enabling Global Communication through Automated Real-Time Video Dubbing

K Priya, M Maanesh - 2023 IEEE Technology & Engineering …, 2023 - ieeexplore.ieee.org
In today's digital age, the demand for online video content is skyrocketing. However,
reaching a diverse, multilingual audience poses a significant challenge due to language …

Intelligent English Translation Evaluation System Based on Internet Automation Technology

S Yang - 2023 International Conference on Data Science and …, 2023 - ieeexplore.ieee.org
To reduce manual evaluation work and improve evaluation performance, this article
develops an intelligent English translation evaluation system based on Internet automation …