A survey of data augmentation approaches for NLP

SY Feng, V Gangal, J Wei, S Chandar… - arXiv preprint arXiv …, 2021 - arxiv.org
Data augmentation has recently seen increased interest in NLP due to more work in low-
resource domains, new tasks, and the popularity of large-scale neural networks that require …

Segmentation-Free Streaming Machine Translation

J Iranzo-Sánchez, J Iranzo-Sánchez… - Transactions of the …, 2024 - direct.mit.edu
Abstract Streaming Machine Translation (MT) is the task of translating an unbounded input
text stream in real-time. The traditional cascade approach, which combines an Automatic …

From simultaneous to streaming machine translation by leveraging streaming history

J Iranzo-Sánchez, J Civera, A Juan - arXiv preprint arXiv:2203.02459, 2022 - arxiv.org
Simultaneous Machine Translation is the task of incrementally translating an input sentence
before it is fully available. Currently, simultaneous translation is carried out by translating …

Improved long-form spoken language translation with large language models

AD McCarthy, H Zhang, S Kumar, F Stahlberg… - arXiv preprint arXiv …, 2022 - arxiv.org
A challenge in spoken language translation is that plenty of spoken content is long-form, but
short units are necessary for obtaining high-quality translations. To address this mismatch …

Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models

AD McCarthy, H Zhang, S Kumar… - Findings of the …, 2023 - aclanthology.org
One challenge in speech translation is that plenty of spoken content is long-form, but short
units are necessary for obtaining high-quality translations. To address this mismatch, we …

Neural Machine Translation for Low-Resource Languages from a Chinese-centric Perspective: A Survey

J Zhang, K Su, H Li, J Mao, Y Tian, F Wen… - ACM Transactions on …, 2024 - dl.acm.org
Machine translation—the automatic transformation of one natural language (source
language) into another (target language) through computational means—occupies a central …

FINE-TUNING MULTILINGUAL PRETRAINED AFRICAN LANGUAGE MODELS

RL Myoya, F Banda, V Marivate… - 4th Workshop on African …, 2023 - openreview.net
With the recent increase in low-resource African language text corpora, there have been
advancements which have led to development of multilingual pre-trained language models …

A multimodal simultaneous interpretation prototype: Who said what

X Wang, M Utiyama, E Sumita - … of the 15th Biennial Conference of …, 2022 - aclanthology.org
Abstract “Who said what” is essential for users to understand video streams that have more
than one speaker, but conventional simultaneous interpretation systems merely present …

[PDF][PDF] Manipulating Data Representations for Neural Machine Translation

C Amrhein - 2023 - zora.uzh.ch
In natural language processing, much current research focuses on training larger and larger
models on more and more data. In this thesis, we argue that how data is represented can …

Robust Translation of French Live Speech Transcripts

E Bertin-Lemée, G Klein, JM Crego… - Proceedings of the 15th …, 2022 - aclanthology.org
Despite a narrowed performance gap with direct approaches, cascade solutions, involving
automatic speech recognition (ASR) and machine translation (MT) are still largely employed …