ParaCrawl: Web-scale acquisition of parallel corpora

M Bañón, P Chen, B Haddow, K Heafield, H Hoang… - 2020 - strathprints.strath.ac.uk
… alignment and sentence pair filtering. We also describe the … : The afrl submission to the
wmt19 parallel corpus filtering for … filtering system for the WMT 2018 parallel corpus filtering task. …

The ARC-NKUA submission for the English-Ukrainian General Machine Translation Shared Task at WMT22

D Roussis, V Papavassiliou - … on Machine Translation (WMT), 2022 - aclanthology.org
… Furthermore, we discuss filtering techniques and the acquisition of additional data used for
… the parallel and monolingual corpora, as well as the acquisition, selection, filtering and pre-…

Bitextedit: Automatic bitext editing for improved low-resource machine translation

E Briakou, SI Wang, L Zettlemoyer… - arXiv preprint arXiv …, 2021 - arxiv.org
2018). Past submissions to the Parallel Corpus Filtering WMT shared task employ a diverse
set of approaches covering simple pre-filtering rules based on language identifiers and …

[PDF][PDF] Building End-to-End Neural Machine Translation Systems for Crisis Scenarios: The Case of COVID-19

DG Roussis - 2022 - core.ac.uk
… This also led to the submission of two NMT systems at the … in this thesis, we the 2018 version
which was extracted from … EN-EL parallel corpus since we apply different filtering methods (…

Exploiting bilingual lexicons to improve multilingual embedding-based document and sentence alignment for low-resource languages

A Fernando, S Ranathunga, D Sachintha… - … and Information Systems, 2023 - Springer
parallel corpus mining pipeline follows a sequence of tasks, … , sentence alignment and
parallel sentence filtration [5]. In our … S (2016) The ilsp/arc submission to the wmt 2016 bilingual …

Making the most of comparable corpora in Neural Machine Translation: a case study

H Gete, T Etchegoyhen - Language Resources and Evaluation, 2022 - Springer
… (2018), Footnote 5 to which we will refer as ode in what … dataset without length filtering
and not for the filtered variant, we … parallel corpora with synthetic translations (Edunov et al., …

Handle with care: A case study in comparable corpora exploitation for neural machine translation

T Etchegoyhen, H Gete - … of the Twelfth Language Resources and …, 2020 - aclanthology.org
… In particular, we show that filtering in terms of alignment thresholds and … ; Khayrallah and
Koehn, 2018). Due to the nature of the … The ilsp/arc submission to the wmt 2016 bilingual docu- …