The ilsp/arc submission to the wmt 2018 parallel corpus filtering shared task

V Papavassiliou, S Sofianopoulos… - Proceedings of the …, 2018 - aclanthology.org
Proceedings of the Third Conference on Machine Translation: Shared …, 2018aclanthology.org
This paper describes the submission of the Institute for Language and Speech
Processing/Athena Research and Innovation Center (ILSP/ARC) for the WMT 2018 Parallel
Corpus Filtering shared task. We explore several properties of sentences and sentence
pairs that our system explored in the context of the task with the purpose of clustering
sentence pairs according to their appropriateness in training MT systems. We also discuss
alternative methods for ranking the sentence pairs of the most appropriate clusters with the …
Abstract
This paper describes the submission of the Institute for Language and Speech Processing/Athena Research and Innovation Center (ILSP/ARC) for the WMT 2018 Parallel Corpus Filtering shared task. We explore several properties of sentences and sentence pairs that our system explored in the context of the task with the purpose of clustering sentence pairs according to their appropriateness in training MT systems. We also discuss alternative methods for ranking the sentence pairs of the most appropriate clusters with the aim of generating the two datasets (of 10 and 100 million words as required in the task) that were evaluated. By summarizing the results of several experiments that were carried out by the organizers during the evaluation phase, our submission achieved an average BLEU score of 26.41, even though it does not make use of any language-specific resources like bilingual lexica, monolingual corpora, or MT output, while the average score of the best participant system was 27.91.
aclanthology.org
以上显示的是最相近的搜索结果。 查看全部搜索结果