J Uszkoreit, J Ponte, A Popat… - Proceedings of the 23rd …, 2010 - aclanthology.org
A distributed system is described that reliably mines parallel text from large corpora. The approach can be regarded as cross-language near-duplicate detection, enabled by an …
K Darwish, W Magdy - Foundations and Trends® in …, 2014 - nowpublishers.com
In the past several years, Arabic Information Retrieval (IR) has garnered significant attention. The main research interests have focused on retrieval of formal language, mostly in the …
W Lewis, R Munro, S Vogel - … of the Sixth Workshop on Statistical …, 2011 - aclanthology.org
In this paper, we propose that MT is an important technology in crisis events, something that can and should be an integral part of a rapid-response infrastructure. By integrating MT …
Grapheme-to-phoneme models are key components in automatic speech recognition and text-to-speech systems. With low-resource language pairs that do not have available and …
A El-Kishky, F Guzmán - arXiv preprint arXiv:2002.00761, 2020 - arxiv.org
Document alignment aims to identify pairs of documents in two distinct languages that are of comparable content or translations of each other. Such aligned data can be used for a …
Users of the WWW across the globe are increasing rapidly. According to Internet live stats there are more than 3 billion Internet users worldwide today and the number of non-English …
This report documents the details of the Transliteration Mining Shared Task that was run as a part of the Named Entities Workshop (NEWS 2010), an ACL 2010 workshop. The shared …
NT Le, F Sadat - Proceedings of the Seventh Named Entities …, 2018 - aclanthology.org
Grapheme-to-phoneme models are key components in automatic speech recognition and text-to-speech systems. With low-resource language pairs that do not have available and …
Today, parallel corpus-based systems dominate the transliteration landscape. But the resource-scarce languages do not enjoy the luxury of large parallel transliteration corpus …