Parallel strands: A preliminary investigation into mining the web for bilingual text

P Resnik - Conference of the Association for Machine Translation …, 1998 - Springer
Parallel corpora are a valuable resource for machine translation, but at present their
availability and utility is limited by genre-and domain-specificity, licensing restrictions, and …

[PDF][PDF] Mining the web for bilingual text

P Resnik - Proceedings of the 37th annual meeting of the …, 1999 - aclanthology.org
Abstract STRAND (Resnik, 1998) is a languageindependent system for automatic discovery
of text in parallel translation on the World Wide Web. This paper extends the preliminary …

The web as a parallel corpus

P Resnik, NA Smith - Computational Linguistics, 2003 - direct.mit.edu
Parallel corpora have become an essential resource for work in multilingual natural
language processing. In this article, we report on our work using the STRAND system for …

[PDF][PDF] Large scale parallel document mining for machine translation

J Uszkoreit, J Ponte, A Popat… - Proceedings of the 23rd …, 2010 - aclanthology.org
A distributed system is described that reliably mines parallel text from large corpora. The
approach can be regarded as cross-language near-duplicate detection, enabled by an …

Mining parallel fragments from comparable texts

M Cettolo, M Federico, N Bertoldi - Proceedings of the 7th …, 2010 - aclanthology.org
This paper proposes a novel method for exploiting comparable documents to generate
parallel data for machine translation. First, each source document is paired to each sentence …

[PDF][PDF] Building a web-based parallel corpus and filtering out machine-translated text

A Antonova, A Misyurev - Proceedings of the 4th Workshop on …, 2011 - aclanthology.org
We describe a set of techniques that have been developed while collecting parallel texts for
Russian-English language pair and building a corpus of parallel sentences for training a …

Automatic construction of English/Chinese parallel corpora

CC Yang, KW Li - Journal of the American society for …, 2003 - Wiley Online Library
As the demand for global information increases significantly, multilingual corpora has
become a valuable linguistic resource for applications to cross‐lingual information retrieval …

Bits: A method for bilingual text search over the web

X Ma, M Liberman - Proceedings of Machine Translation Summit …, 1999 - aclanthology.org
Parallel corpus are valuable resource for machine translation, multi-lingual text retrieval,
language education and other applications, but for various reasons, its availability is very …

[图书][B] Empirical methods for exploiting parallel texts

ID Melamed - 2001 - books.google.com
This book lays out the theory and the practical techniques for discovering and applying
translational equivalence at the lexical level. Parallel texts (bitexts) are a goldmine of …

ParaCrawl: Web-scale acquisition of parallel corpora

M Bañón, P Chen, B Haddow, K Heafield, H Hoang… - 2020 - strathprints.strath.ac.uk
We report on methods to create the largest publicly available parallel corpora by crawling
the web, using open source software. We empirically compare alternative methods and …