pages to their “translations”, in a big space of possible pairs. We present four methods: The
first one uses the term position similarity between candidate document pairs. The second
method requires automatically translated versions of the target text, and matches them with
the candidates. The third and fourth methods try to overcome some of the challenges
presented by the nature of the corpus, by considering the string similarity of source URL and …