相关文章- 学术资源搜索

Findings of the WMT 2020 shared task on parallel corpus filtering and alignment

P Koehn, V Chaudhary, A El-Kishky… - Proceedings of the …, 2020 - aclanthology.org

Following two preceding WMT Shared Task on Parallel Corpus Filtering (Koehn et al., 2018,
2019), we posed again the challenge of assigning sentence-level quality scores for very …

被引用次数：75 相关文章

[PDF] ed.ac.uk

Findings of the WMT 2018 shared task on parallel corpus filtering

P Koehn, H Khayrallah, K Heafield… - EMNLP 2018 Third …, 2018 - research.ed.ac.uk

We posed the shared task of assigning sentence-level quality scores for a very noisy corpus
of sentence pairs crawled from the web, with the goal of sub-selecting 1% and 10% of high …

被引用次数：120 相关文章所有 12 个版本

[PDF] aclanthology.org

Findings of the WMT 2019 shared task on parallel corpus filtering for low-resource conditions

P Koehn, F Guzmán, V Chaudhary… - Proceedings of the Fourth …, 2019 - aclanthology.org

Abstract Following the WMT 2018 Shared Task on Parallel Corpus Filtering, we posed the
challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs …

被引用次数：81 相关文章所有 6 个版本

[PDF] aclanthology.org

Prompsit's submission to WMT 2018 parallel corpus filtering shared task

VM Sánchez-Cartagena, M Bañón… - Proceedings of the …, 2018 - aclanthology.org

Abstract This paper describes Prompsit Language Engineering's submissions to the WMT
2018 parallel corpus filtering shared task. Our four submissions were based on an automatic …

被引用次数：62 相关文章所有 5 个版本

[PDF] arxiv.org

Parallel corpus filtering via pre-trained language models

B Zhang, A Nagesh, K Knight - arXiv preprint arXiv:2005.06166, 2020 - arxiv.org

Web-crawled data provides a good source of parallel corpora for training machine
translation models. It is automatically obtained, but extremely noisy, and recent work shows …

被引用次数：31 相关文章所有 3 个版本

[PDF] aclanthology.org

Accurate semantic textual similarity for cleaning noisy parallel corpora using semantic machine translation evaluation metric: The NRC supervised submissions to the …

C Lo, M Simard, D Stewart, S Larkin… - Proceedings of the …, 2018 - aclanthology.org

We present our semantic textual similarity approach in filtering a noisy web crawled parallel
corpus using YiSi—a novel semantic machine translation evaluation metric. The systems …

被引用次数：35 相关文章所有 5 个版本

[PDF] helsinki.fi

OpusFilter: A configurable parallel corpus filtering toolbox

M Aulamo, S Virpioja… - … Annual Conference of …, 2020 - researchportal.helsinki.fi

This paper introduces OpusFilter, a flexible and modular toolbox for filtering parallel corpora.
It implements a number of components based on heuristic filters, language identification …

被引用次数：38 相关文章所有 5 个版本

[PDF] arxiv.org

Microsoft's submission to the wmt2018 news translation task: How i learned to stop worrying and love the data

M Junczys-Dowmunt - arXiv preprint arXiv:1809.00196, 2018 - arxiv.org

This paper describes the Microsoft submission to the WMT2018 news translation shared
task. We participated in one language direction--English-German. Our system follows …

被引用次数：41 相关文章所有 9 个版本

[PDF] arxiv.org

Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation

T Hasan, A Bhattacharjee, K Samin, M Hasan… - arXiv preprint arXiv …, 2020 - arxiv.org

Despite being the seventh most widely spoken language in the world, Bengali has received
much less attention in machine translation literature due to being low in resources. Most …

被引用次数：60 相关文章所有 5 个版本

[PDF] aclanthology.org

The impact of sentence alignment errors on phrase-based machine translation performance

C Goutte, M Carpuat, G Foster - … of the 10th conference of the …, 2012 - aclanthology.org

When parallel or comparable corpora are harvested from the web, there is typically a
tradeoff between the size and quality of the data. In order to improve quality, corpus …

被引用次数：46 相关文章所有 8 个版本

高级搜索

QQ 群

Findings of the WMT 2020 shared task on parallel corpus filtering and alignment

Findings of the WMT 2018 shared task on parallel corpus filtering

Findings of the WMT 2019 shared task on parallel corpus filtering for low-resource conditions

Prompsit's submission to WMT 2018 parallel corpus filtering shared task

Parallel corpus filtering via pre-trained language models

Accurate semantic textual similarity for cleaning noisy parallel corpora using semantic machine translation evaluation metric: The NRC supervised submissions to the …

OpusFilter: A configurable parallel corpus filtering toolbox

Microsoft's submission to the wmt2018 news translation task: How i learned to stop worrying and love the data

Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation

The impact of sentence alignment errors on phrase-based machine translation performance

相关搜索

引用