subtitles found on the Web. The subtitles have specific formats and encodings. In a first step,
we convert them to our multilingual subtitle format based on XML. In a second step, we align
the subtitle sentences with the time used to display them on the screen. We implemented the
tool Jimaku in order to semi-automatically perform these steps. The last step consists in
aligning the sentences at the sub-sentence level and to index the corpus for contextual …