Videodubber: Machine translation with speech-aware length control for video dubbing

A Chronopoulou, B Thompson, P Mathur… - arXiv preprint arXiv …, 2023 - arxiv.org

Automatic dubbing (AD) is the task of translating the original speech in a video into target
language speech. The new target language speech should satisfy isochrony; that is, the new …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

C Le, Y Qian, D Wang, L Zhou, S Liu, X Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

There is a rising interest and trend in research towards directly translating speech from one
language to another, known as end-to-end speech-to-speech translation. However, most …

被引用次数：1 相关文章所有 2 个版本

[PDF] plos.org

Biomedical named entity recognition based on multi-cross attention feature fusion

D Zheng, R Han, F Yu, Y Li - Plos one, 2024 - journals.plos.org

Currently, in the field of biomedical named entity recognition, CharCNN (Character-level
Convolutional Neural Networks) or CharRNN (Character-level Recurrent Neural Network) is …

Anim-400K: A Large-Scale Dataset for Automated End to End Dubbing of Video

K Cai, C Liu, DM Chan - ICASSP 2024-2024 IEEE International …, 2024 - ieeexplore.ieee.org

The Internet's wealth of content, with up to 60% published in English, starkly contrasts the
global population, where only 18.8% are English speakers, and just 5.1% consider it their …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing

L Xiong, X Cheng, J Tan, X Wu, X Li, L Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org

Audio-driven talking face generation aims to synthesize video with lip movements
synchronized to input audio. However, current generative techniques face challenges in …

Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning

SR Mhaskar, NJ Shah, M Zaki, AP Gudmalwar… - arXiv preprint arXiv …, 2024 - arxiv.org

Traditional Automatic Video Dubbing (AVD) pipeline consists of three key modules, namely,
Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), and Text-to …

DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing

N Sahipjohn, A Gudmalwar, N Shah, P Wasnik… - arXiv preprint arXiv …, 2024 - arxiv.org

Audio-visual alignment after dubbing is a challenging research problem. To this end, we
propose a novel method, DubWise Multi-modal Large Language Model (LLM)-based Text-to …

Improving isochronous machine translation with target factors and auxiliary counters

P Pal, B Thompson, Y Virkar, P Mathur… - arXiv preprint arXiv …, 2023 - arxiv.org

To translate speech for automatic dubbing, machine translation needs to be isochronous, ie
translated speech needs to be aligned with the source in terms of speech durations. We …

被引用次数：2 相关文章所有 7 个版本

Enabling Global Communication through Automated Real-Time Video Dubbing

K Priya, M Maanesh - 2023 IEEE Technology & Engineering …, 2023 - ieeexplore.ieee.org

In today's digital age, the demand for online video content is skyrocketing. However,
reaching a diverse, multilingual audience poses a significant challenge due to language …

Intelligent English Translation Evaluation System Based on Internet Automation Technology

S Yang - 2023 International Conference on Data Science and …, 2023 - ieeexplore.ieee.org

To reduce manual evaluation work and improve evaluation performance, this article
develops an intelligent English translation evaluation system based on Internet automation …

高级搜索

QQ 群