J Uszkoreit, JM Ponte, AC Popat… - Proceedings of the 23rd …, 2010 - dl.acm.org
A distributed system is described that reliably mines parallel text from large corpora. The
approach can be regarded as cross-language near-duplicate detection, enabled by an …