Beyond english-centric multilingual machine translation

A Fan, S Bhosale, H Schwenk, Z Ma, A El-Kishky… - Journal of Machine …, 2021 - jmlr.org
Existing work in translation demonstrated the potential of massively multilingual machine
translation by training a single model able to translate between any pair of languages …

Findings of the 2017 conference on machine translation (wmt17)

O Bojar, R Chatterjee, C Federmann, Y Graham… - 2017 - doras.dcu.ie
This paper presents the results of the WMT17 shared tasks, which included three machine
translation (MT) tasks (news, biomedical, and multimodal), two evaluation tasks (metrics and …

Survey of low-resource machine translation

B Haddow, R Bawden, AVM Barone, J Helcl… - Computational …, 2022 - direct.mit.edu
We present a survey covering the state of the art in low-resource machine translation (MT)
research. There are currently around 7,000 languages spoken in the world and almost all …

Wikimatrix: Mining 135m parallel sentences in 1620 language pairs from wikipedia

H Schwenk, V Chaudhary, S Sun, H Gong… - arXiv preprint arXiv …, 2019 - arxiv.org
We present an approach based on multilingual sentence embeddings to automatically
extract parallel sentences from the content of Wikipedia articles in 85 languages, including …

ParaCrawl: Web-scale acquisition of parallel corpora

M Bañón, P Chen, B Haddow, K Heafield, H Hoang… - 2020 - strathprints.strath.ac.uk
We report on methods to create the largest publicly available parallel corpora by crawling
the web, using open source software. We empirically compare alternative methods and …

CCMatrix: Mining billions of high-quality parallel sentences on the web

H Schwenk, G Wenzek, S Edunov, E Grave… - arXiv preprint arXiv …, 2019 - arxiv.org
We show that margin-based bitext mining in a multilingual sentence space can be applied to
monolingual corpora of billions of sentences. We are using ten snapshots of a curated …

Domain adaptation and multi-domain adaptation for neural machine translation: A survey

D Saunders - Journal of Artificial Intelligence Research, 2022 - jair.org
The development of deep learning techniques has allowed Neural Machine Translation
(NMT) models to become extremely powerful, given sufficient training data and training time …

CCAligned: A massive collection of cross-lingual web-document pairs

A El-Kishky, V Chaudhary, F Guzmán… - arXiv preprint arXiv …, 2019 - arxiv.org
Cross-lingual document alignment aims to identify pairs of documents in two distinct
languages that are of comparable content or translations of each other. In this paper, we …

MUSS: Multilingual unsupervised sentence simplification by mining paraphrases

L Martin, A Fan, É De La Clergerie, A Bordes… - arXiv preprint arXiv …, 2020 - arxiv.org
Progress in sentence simplification has been hindered by a lack of labeled parallel
simplification data, particularly in languages other than English. We introduce MUSS, a …

I don't need an expert! making url phishing features human comprehensible

K Althobaiti, N Meng, K Vaniea - … of the 2021 CHI Conference on Human …, 2021 - dl.acm.org
Judging the safety of a URL is something that even security experts struggle to do accurately
without additional information. In this work, we aim to make experts' tools accessible to non …