H Açarçiçek, T Çolakoğlu, PEA Hatipoğlu… - Proceedings of the …, 2020 - aclanthology.org
This paper illustrates Huawei's submission to the WMT20 low-resource parallel corpus filtering shared task. Our approach focuses on developing a proxy task learner on top of a …
We posed the shared task of assigning sentence-level quality scores for a very noisy corpus of sentence pairs crawled from the web, with the goal of sub-selecting 1% and 10% of high …
Recent machine translation algorithms mainly rely on parallel corpora. However, since the availability of parallel corpora remains limited, only some resource-rich language pairs can …
Following two preceding WMT Shared Task on Parallel Corpus Filtering (Koehn et al., 2018, 2019), we posed again the challenge of assigning sentence-level quality scores for very …
Scaling natural language processing (NLP) to low‐resourced languages to improve machine translation (MT) performance remains enigmatic. This research contributes to the …
M Pinnis - Proceedings of the Third Conference on Machine …, 2018 - aclanthology.org
The paper describes parallel corpus filtering methods that allow reducing noise of noisy “parallel” corpora from a level where the corpora are not usable for neural machine …
P Koehn, F Guzmán, V Chaudhary… - Proceedings of the Fourth …, 2019 - aclanthology.org
Abstract Following the WMT 2018 Shared Task on Parallel Corpus Filtering, we posed the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs …
Neural machine translation, which achieves near human-level performance in some languages, strongly relies on the large amounts of parallel sentences, which hinders its …
This paper describes the submission of RWTH Aachen University for the De→ En parallel corpus filtering task of the EMNLP 2018 Third Conference on Machine Translation (WMT …