Findings of the WMT 2019 shared task on parallel corpus filtering for low-resource conditions

S Ranathunga, ESA Lee, M Prifti Skenduli… - ACM Computing …, 2023 - dl.acm.org

Neural Machine Translation (NMT) has seen tremendous growth in the last ten years since
the early 2000s and has already entered a mature phase. While considered the most widely …

被引用次数：176 相关文章所有 6 个版本

[PDF] jair.org Full View

Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org

Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

被引用次数：118 相关文章所有 6 个版本

[PDF] arxiv.org

No language left behind: Scaling human-centered machine translation

MR Costa-jussà, J Cross, O Çelebi, M Elbayad… - arXiv preprint arXiv …, 2022 - arxiv.org

Driven by the goal of eradicating language barriers on a global scale, machine translation
has solidified itself as a key focus of artificial intelligence research today. However, such …

被引用次数：477 相关文章所有 2 个版本

[PDF] jmlr.org

Beyond english-centric multilingual machine translation

A Fan, S Bhosale, H Schwenk, Z Ma, A El-Kishky… - Journal of Machine …, 2021 - jmlr.org

Existing work in translation demonstrated the potential of massively multilingual machine
translation by training a single model able to translate between any pair of languages …

被引用次数：672 相关文章所有 9 个版本

[PDF] uzh.ch

Findings of the 2019 conference on machine translation (WMT19)

L Barrault, O Bojar, MR Costa-Jussa, C Federmann… - 2019 - zora.uzh.ch

This paper presents the results of the premier shared task organized alongside the
Conference on Machine Translation (WMT) 2019. Participants were asked to build machine …

被引用次数：721 相关文章所有 13 个版本

[PDF] mit.edu

Survey of low-resource machine translation

B Haddow, R Bawden, AVM Barone, J Helcl… - Computational …, 2022 - direct.mit.edu

We present a survey covering the state of the art in low-resource machine translation (MT)
research. There are currently around 7,000 languages spoken in the world and almost all …

被引用次数：124 相关文章所有 13 个版本

[PDF] arxiv.org

Wikimatrix: Mining 135m parallel sentences in 1620 language pairs from wikipedia

H Schwenk, V Chaudhary, S Sun, H Gong… - arXiv preprint arXiv …, 2019 - arxiv.org

We present an approach based on multilingual sentence embeddings to automatically
extract parallel sentences from the content of Wikipedia articles in 85 languages, including …

被引用次数：312 相关文章所有 5 个版本

[PDF] strath.ac.uk

ParaCrawl: Web-scale acquisition of parallel corpora

M Bañón, P Chen, B Haddow, K Heafield, H Hoang… - 2020 - strathprints.strath.ac.uk

We report on methods to create the largest publicly available parallel corpora by crawling
the web, using open source software. We empirically compare alternative methods and …

被引用次数：220 相关文章所有 17 个版本

[PDF] arxiv.org

Detecting hallucinated content in conditional neural sequence generation

C Zhou, G Neubig, J Gu, M Diab, P Guzman… - arXiv preprint arXiv …, 2020 - arxiv.org

Neural sequence models can generate highly fluent sentences, but recent studies have also
shown that they are also prone to hallucinate additional content not supported by the input …

被引用次数：155 相关文章所有 6 个版本

[PDF] arxiv.org

CCMatrix: Mining billions of high-quality parallel sentences on the web

H Schwenk, G Wenzek, S Edunov, E Grave… - arXiv preprint arXiv …, 2019 - arxiv.org

We show that margin-based bitext mining in a multilingual sentence space can be applied to
monolingual corpora of billions of sentences. We are using ten snapshots of a curated …

被引用次数：201 相关文章所有 5 个版本

高级搜索

QQ 群